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(54) Optical disc and computer-readable storage medium, and recording method and apparatus 
therefor 



(57) An optical disc records video objects that are 
obtained by multiplexing a video stream and an audio 
stream. The audio stream is an arrangement of a plural- 
ity of sets of audio frame data. Each video object unit in 
a video object is an arrangement of packs that have a 
different payload. The video stream and audio stream 
are divided using a predetermined size and the result- 
ing data divisions are arranged into packs. At least one 
video object unit includes packs where stuffing bytes or 



a padding packet is arranged with part or all of a set of 
audio frame data so that the boundary with the next 
video object unit corresponds to a boundary between a 
boundary between sets of audio frame data. Since the 
boundary between video objects is made to match a 
boundary between sets of audio frame data, partial 
deletes that are performed with a video object unit as 
the smallest unit will not result in unnecessary parts of 
data remaining on the optical disc. 




Princeo by Xerox (UK) Business Services 
2.1 6.7/3 6 



1 



EP 0 926 903 A1 



2 



Description 

BACKGROUND OF THE INVENTION 

1 . Field of the Invention s 

[0001 ] The present invention relates to an optical disc 
that records MPEG (Moving Pictures Experts Group) 
streams in which video streams and audio streams have 
been multiplexed. The present invention also relates to ?o 
a recording apparatus, and a computer-readable stor- 
age medium storing a recording program for the optical 
disc. 

2. Description of the Background Art 15 

[0002] Many movie and home movie fans are not sat- 
isfied with merely viewing video images and want to 
freely edit the content of recorded images. 
[0003] When editing images, a user may delete an 20 
unwanted section from an MPEG stream that has been 
obtained by multiplexing one or more video streams and 
audio streams. Users may also change the reproduction 
order of an edited MPEG stream as desired. 
[0004] File systems that handle MPEG streams like a 25 
computer handles files have been subject to increasing 
attention for their role in the realization of the editing 
functions described above. The term "file system" is a 
general name for a data construction for managing the 
areas on a random access storage medium, like a hard 30 
disk drive or an optical disc. As one example, file sys- 
tems standardized under ISO/IEC (International Stand- 
ardization Organization/International Electrotechnical 
Commission) 13346 are used to store MPEG streams in 
files. 35 
[0005] In such a file system, the files that store MPEG . 
streams are managed using management information 
called directory files and file entries. Of these, a file 
entry includes a separate allocation descriptor for each 
extent that composes a file. Each allocation descriptor 40 
includes a logical block number (LBN) showing the 
recording position of an extent in the file and an extent 
length showing the length of the extent. By updating the 
logical block numbers (LBN) and extent lengths, logical 
sectors on a disc medium can be set as "used" or 45 
"unused". This enables the user to partially delete data 
in units of logical sectors. 

[0006] When a user partially deletes an MPEG stream 
where the minimum deletable unit is one logical sector 
of 2,048 bytes, decoding may not be possible for the so 
resulting video stream and/or audio stream. 
[0007] This problem is caused by the partial deletion 
being performed without consideration to the actual 
amount of MPEG stream data stored in each logical 
sector. For DVD Standard, data is recorded as com- ss 
pressed MPEG streams according to MPEG2 Standard. 
The data size of each pack to be recorded on a DVD is . 
set equal to the logical sector size. 



[0008] As a result, one pack in an MPEG stream is 
recorded in each logical sector. Here, a pack refers to a 
unit of data in an MPEG stream. Under MPEG, video 
streams and audio streams are divided into data divi- 
sions of a predetermined size. These data divisions are 
then converted into packets. A grouping of one or more 
packets is a pack. Packs are given time stamps for data 
transfer of the MPEG stream, making packs the unit 
used for data transfer. On a DVD, there is one-to-one 
correspondence between packs and packets. In this 
data construction, one packet exists within each pack. 
Video packs store divided data for three kinds of picture 
data, namely, Intra (I), Predicative (P), and Bidirection- 
ally Predicative (B) pictures. An I picture results from 
compression of an image using spatial frequency char- 
acteristics within the image, without referring to other 
images. A P picture results from compression of an 
image using correlation with preceding images. A B pic- 
ture results from compression of an image using corre- 
lation with both preceding and succeeding images. 
[0009] When a partial deletion operation updates the 
management information, video packs that store one 
frame of picture data may be partially deleted. If B pic- 
tures or P pictures that refer to the partially deleted 
frame of picture data remain, decoding of such pictures 
will no longer be possible. 

[0010] For audio, audio frame data for a plurality of 
frames is stored in one audio pack. Hereafter, the term 
"audio frame data" refers to the amount of audio data 
that is reproduced for one audio frame. This is generally 
called an "access unit". For an MPEG stream, this is the 
minimum unit for both decoding and reproduction out- 
put. 

[0011] To give specific examples, Dolby-AC3 method 
uses a frame length of 32msec for the encoded audio 
stream, while MPEG uses a frame length of 24msec, 
and LPCM (Linear Pulse Code Modulation) uses a 
frame length of approximately 1.67msec (1/600sec to 
be precise). Since the bitrate when decoding audio 
frame data for Dolby-AC3 is 192Kbps, the size of one 
set of audio frame data is 768 (32msec*192Kbps) 
bytes. 

[001 2] When loading audio frame data into packs, the 
payload size of a pack is subject to a maximum size of 
201 6 bytes. For Dolby- AC3, this is the non-integer value 
of 2.624 times the audio frame data size. Since the pay- 
load size is a non-integer multiple of the audio frame 
data size, dividing the audio stream into units of the pay- 
load size of the packs and storing the data divisions in 
order in packs will result in certain sets of audio frame 
data extending over a boundary between audio packs. 
[0013] The upper part of Fig. 1 shows example audio 
frames. In Fig. 1 , each section between the "<" and ">" 
symbols is an audio frame, with the "<" symbol showing 
the presentation start time and the ">" symbol showing 
the presentation end time. This notation for audio 
frames is also used in the following drawings. The audio 
frame data that should be reproduced (presented) for 
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an audio frame is inputted into a decoder before the 
presentation start time of the audio frame. This audio 
frame data should be taken out of the buffer by the 
decoder at the presentation start time. 
[0014] The lower part of Fig. 1 shows an example of 
how the audio frame data to be reproduced in each 
audio frame is stored in audio packs. In this figure, the 
audio frame data to be reproduced for audio frames fSl , 
f82 is stored in audio pack A71 , the audio frame data for 
audio frame f84 is stored in audio pack A72, and the 
audio frame data for audio frame f86 is stored in audio 
pack A73. 

[001 5] The audio frame data for audio frame f83 is 
divided between the audio pack A71 that comes first 
and the audio pack A72 that comes later. In the same 
way, the audio frame data for audio frame f85 is divided 
between the audio pack A72 that comes first and the 
audio pack A73 that comes later. The reason the audio 
frame data to be reproduced for one audio frame is 
divided and stored in two audio packs is that the bound- 
aries between audio frames do not match the bounda- 
ries between packs. The reason that such boundaries 
do not match is that the data structure of packs under 
MPEG standard is totally unrelated to the data structure 
of audio streams. 

[001 6] If a partial deletion operation in logical sector 
(pack) units is performed by updating the file manage- 
ment information with a set of audio frame data extend- 
ing over a pack boundary as shown in Fig. 1, a set of 
audio frame data that extends over a pack boundary 
that marks a boundary for the partial deletion will be 
changed. As a result, one part of the audio frame data 
will be located in a pack that is managed as "unused" 
while the other part will be located in a pack that is man- 
aged as "used". An example of a set of audio frame data 
that extends over a pack boundary is audio frame data 
f83 in Fig. 1 . 

[0017] MPEG standard stipulates that a continuous 
stream are reproduced from beginning to end and uses 
a model where the unit for decoding is one set of audio 
frame data. Accordingly, a decoder for MPEG standard 
performs decoding under the premise that the begin- 
ning and end of the continuous stream are the bounda- 
ries of a set of audio frame data. As a result, there is no 
guarantee that a decoder will be able to correctly 
decode an audio stream that includes sets of audio 
frame data whose beginning or end is missing. This is 
due to the loss of some of the audio frame data needed 
for the decoding. 

[0018] To ensure that an MPEG stream can be prop- 
erly decoded after a partial deletion, it is necessary to 
first read the MPEG stream before the partial deletion, 
to separate the MPEG stream into video packs and 
audio packs, and to re-encode the video stream in the 
area outside the deleted area in a way that ensures 
decoding will be possible. This re-encoding equates to a 
reconstructing of GOPs. On the other hand, the audio 
stream that is no longer needed is merely discarded, 



and the remaining audio streams are not re-encoded. 
Note that the discarded audio data includes the remain- 
ing parts of partially deleted.sets of audio frame data. 
[0019] After re-encoding, the audio packs and video 

5 packs are multiplexed again to produce an MPEG 
stream. This is then recorded on the storage medium 
and the management information is updated. 
[0020] When partial deletion is performed in this way, 
the analysis of MPEG streams, re-encoding and re-mul- 

10 tiplexing make hardware and software demands on a 
reproduction apparatus. This is to say, recording and/or 
reproduction apparatuses (hereinafter, "recorcfing appa- 
ratuses") that do not include the required hardware and 
software are not able to perform partial deletion. Since 

15 there is a great variety of recording apparatuses that 
range from portable models to devices that are installed 
in personal computers, it cannot be said that all of such 
recording apparatuses are equipped with the required 
hardware and software. 

20 [0021] In particular, many recording apparatuses that 
are installed in personal computers are only equipped 
with the hardware, software and file system that enable 
the reproduction of MPEG streams. If such specific 
hardware and software requirements exist for the reali- 

25 zation of partial deletion operations, only certain types 
of recording apparatus will be able to perform partial 
deletion. This greatly limits the opportunities with which 
users of optical discs will be able to perform partial dele- 
tion operations. 

30 

SUMMARY OF THE INVENTION 

[0022] It is a first object of the present invention to pro- 
vide an optical disc that enables reproduction apparatus 

35 that only have a function for updating management 
information to perform the partial deletion of MPEG 
streams. At the same time, the present invention aims to 
provide a recording apparatus, a recording method, and 
a recording program that record these MPEG streams 

40 onto an optical disc. , 

[0023] The first object of the present invention can be 
achieved by an optical disc for recording video objects 
that are obtained by multiplexing a video stream includ- 
ing a plurality of sets of picture data and an audio 

45 stream including a plurality of sets of audio frame data, 
each video object comprising a plurality of video object 
units whose lengths are within a predetermined range, 
and each video object unit storing complete sets of pic- 
ture data and complete sets of audio frame data. 

so [0024] With the stated construction, each video object 
unit includes a plurality of complete sets of audio frame 
data. Provided a partial deletion operation is performed 
in units of video object units, there is no risk of a partial 
deletion operation leaving a former or latter part of a set 

55 of audio frame data on the optical disc. Since no 
unwanted parts of audio frame data are left of the disc, 
the partial deletion of video objects can be performed 
without needing to re-encode the data on the optical 
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disc. Since the partial deletion operation can be com- 
pleted by merely updating the management information 
in units of video object units, partial deletion operations 
become possible for a wide variety of recording appara- 
tuses. 5 
[0025] Here, picture groups may be formed in the 
video stream, each picture group including at least one 
set of picture data that has been intra-encoded, and 
each video object unit may include at least one com- 
plete picture group. ™ 
[0026] With the stated construction, each video object 
unit includes a plurality of video packs that compose a 
picture group. A picture group includes a set of picture 
data that has been intra-frame encoded, so that as long 
as a recording apparatus performs a partial deletion 75 
operation in units of video object units, no picture data 
that depends on deleted data will be left on the optical 
disc. As a result, proper reproduction is guaranteed for 
the picture data that is left on the optical disc after the 
partial deletion operation. This means that recording 20 
apparatuses can simply perform partial delete opera- 
tions by merely updating the management information 
in video object units. 
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[0027] These and other objects, advantages and fea- 
tures of the invention will become apparent from the fol- 
lowing description thereof taken in conjunction with the 
accompanying drawings that illustrate a specific embod- 30 
iment of the invention. In the Drawings: 

Fig. 1 shows how sets of audio frame data can 
extend over pack boundaries; 

Fig. 2A shows the outward appearance of a DVD- 35 
RAM disc that is the recordable optical disc used in 
the embodiments of the present invention; 
Fig. 2B shows the recording areas on a DVD-RAM, 
Fig. 2C shows the cross-section and surface of a 
DVD -RAM cut at sector level; <o 
Fig. 3A shows the zones 0 to 23 on a DVD-RAM; 
Fig. 3B shows the zones 0 to 23 arranged into a 
horizontal sequence; 

Fig. 3C shows the logical sector numbers (LSN) in 
the volume area; 45 
Fig. 3D shows the logical block numbers (LBN) in 
the volume area; 

Fig. 4A shows the contents of the data recorded in 
the volume area; 

Fig. 4B shows an example data structure of a file so 
entry; 

Fig. 5 shows a plurality of sets of picture data 

arranged in display order and a plurality of sets of 

picture data arranged in coding order; 

Fig. 6A shows a detailed hierarchy of the logical for- ss 

mats in the data construction of a VOB (Video 

Object); 

Fig. 6B shows the logical format of a video pack 



that is arranged at the front of a VOBU; 

Fig. 6C shows the logical format of a video pack 

that is not arranged at the front of a. VOBU; 

Fig. 6D shows the logical format of a system 

header; 

Fig. 7A shows the logical format of an audio pack 
for Dolby-AC3 methods; 

Rg. 7B shows the logical format of an audio pack 
for Unear-PCM methods; 

Rg. 7C shows the logical format of an audio pack 
for MPEG-Audio methods; 
Rg. 7D shows the logical format of a pack header, 
a packet header, and the audio frame information; 
Rg. 8 is a graph showing the buffer state of the 
audio decoder buffer; 

Rg. 9A is a graph showing the buffer state of the 
video buffer; 

Rg. 9B is a graph showing the transfer period of 
each set of picture data; 

Rg. 10 shows how the audio packs that store the. 
audio frame data reproduced in a plurality of audio 
frames and the video packs that store the picture 
data reproduced in each video frame should be 
recorded; 

Rg. 1 1 shows how each set of audio frame data is 
stored in the payload of each pack when the total 
size of the payloads of the audio packs included in 
a VOBU is an integer multiple of the audio frame 
data size; 

Rg. 12 shows how each set of audio frame data is 
stored in each pack when the total size of the pay- 
loads of the audio packs included in a VOBU is a 
non-integer multiple of the audio frame data size; 
Rgs. 13A and 13B show examples of packs in 
which padding packets and stuffing bytes have 
respectively been inserted; 
Rg. 14 shows a detailed hierarchy of the stored 
content of the RTRW management file; 
Rg. 15 shows how video fields are specified using 
the C_V_S_PTM, C_V_E_PTM in the cell informa- 
tion; 

Rg. 16 shows how VOBs are accessed using a 
PGC; 

Rg. 1 7 shows the part, out of the cells shown in Rg. 
16, that corresponds to cells subjected to partial 
deletion using cross hatching; 
Rg. 18A shows which ECC blocks on a DVD- RAM 
are freed to become unused areas as a result of a 
partial deletion that uses PGG information #2; 
Rg. 18B shows examples of the VOBs. VOB infor- 
mation, and PGC information after a partial dele- 
tion; 

Rgs. 19A and 19B show VOBU #i+1 and VOBU 

#+2 before and after a partial deletion; 

Rgs. 20A and 20B show VOBU #j+1 and VOBU 

#j+2 before and after a partial deletion; 

Rg. 21 shows an example configuration of a system 

that uses the recording apparatus of the present 
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invention; 

Fig. 22 is a block diagram showing the hardware 
construction of the DVD recorder 70; 
Fig. 23A shows the construction the MPEG 
encoder 2; 

Fig. 23B shows the internal construction of the sys- 
tem encoder 2e; 

Fig. 24 is a representation of when a boundary 
between VOBUs matches a boundary between 
sets of audio frame data; 

Fig. 25 is a representation of when a boundary 
between VOBUs is made to match a boundary 
between sets of audio frame data as a result of the 
generation of an audio pack that transfers only a 
remaining part of a set of audio frame data to the 
audio decoder buffer; 

Fig. 26A shows that a final set of audio frame data 
is only partially stored when 4KB of audio frame 
data is stored in the audio decoder buffer; 
Fig. 26B shows the buffer state when control is per- 
formed to prevent the audio decoder buffer from 
becoming full; 

Fig. 27 is a flowchart that shows the procedure by 
which the audio packing unit 15 generates packs 
while simulating the audio decoder buffer; 
Fig. 28 is aflowchart showing the processing for the 
partial deletion of a VOB; 

Fig. 29A is a representation of when the deleted 

area is positioned at the start of an extent; 

Fig: 29B is a representation of when the- deleted 

area is positioned at the end of an extent; 

Fig. 29C is a representation of when the deleted 

area is positioned midway through an extent; 

Fig. 30 shows the case where one set of audio 

frame data is stored in each pack; and 

Fig. 31 shows the changes in the buffer state that 

are caused by the VOBUs shown in Fig. 30. 

DESCRIPTION OF THE PREFERRED EMBODI- 
MENTS 

[0028] The following is an explanation of an optical 
disc and a recording apparatus that are embodiments of 
the present invention. This explanation refers to the 
accompanyi ng drawings. 

(1-11 Physical Structure of a Recordable Optical Disc 

[0029] Fig. 2 A shows the external appearance of a 
DVD-RAM disc that is a recordable optical disc. As 
shown in this drawing, the DVD- RAM is loaded into a 
recording apparatus having been placed into a cartridge 
75. This cartridge 75 protects the recording surface of 
the DVD-RAM, and has a shutter 76 that opens and 
closes to allow access to the DVD-RAM enclosed 
inside. 

[0030] Fig. 2B shows the recording area of a DVD- 
RAM disc. As shown in the figure, the DVD-RAM has a 



lead-in area at its innermost periphery, a lead-out area 
at its outermost periphery, and a data area in between. 
The lead-in area records the..necessary reference sig- 
nals for the stabilization of a servo during access by an 

s optical pickup, and identification signals to prevent con- 
fusion with other media. The lead-out area records the 
same types of reference signals as the lead-in area. 
The data area, meanwhile, is divided into sectors that 
are the smallest unit for which access to the DVD-RAM 

w is possible. Here, the size of each sector is set at 2KB. 
[0031 ] Fig. 2C shows the cross-section and surface of 
a DVD-RAM cut at the header of a sector. As shown in 
the figure, each sector includes a pit sequence that is 
formed in the surface of a reflective film, such as a metal 

is film, and a concave-convex part. 

[0032] The pit sequence includes 0.4^m-1 .87 ^im pits 
that are carved into the surface of the DVD-RAM to 
show the sector address. 

[0033] The concave-convex part includes a concave 

20 part called a "groove" and a convex part called a "land". 
Each groove and land has a recording mark composed 
of a metal film attached to its surface. This metal film is 
capable of phase change, meaning that the recording 
mark can be in a crystalline state or a non-crystalline 

25 state depending on whether the metal film has been 
exposed to a light beam. Using this phase change char- 
acteristic, data can be recorded into the concave-con- 
vex part. While it is only possible to record data onto the 
land part of an MO (Magnetic-Optical) disc, data can be 

30 recorded onto both the land and the groove parts of a 
DVD-RAM. This means that the recording density of a 
DVD-RAM exceeds that of an MO disc. Error correction 
information is provided on a DVD-RAM for each group 
of 16 sectors, in this specification, each group of 16 sec- 

35 tors that is given an ECC (Error Correcting Code) is 
called an ECC block. 

[0034] On a DVD-RAM, the data area is divided to 
several zones to realize rotation control called Z- 
CLV(Zone-Constant Linear Velocity) during recording 

40 and reproduction. 

[0035] Fig. 3A shows the plurality of zones provided 
on a DVD-RAM. As shown in the figure, a DVD-RAM is 
divided to 24 zones numbered zone 0-zone 23. Each 
zone is a group of tracks that are accessed using the 

45 same angular velocity. In this embodiment, each zone 
includes 1888 tracks. The rotational angular velocity of 
the DVD-RAM is set separately for each zone, and is 
higher the closer a zone is located to the inner periphery 
of the disc. Division of the data area into zones ensures 

so that the optical pickup can move at a constant velocity 
while performing access within a single zone. This 
raises the recording density of the DVD-RAM and facili- 
tates rotation control during recording and reproduction. 
[0036] Fig. 3B shows a horizontal arrangement of the 

55 lead-in area, the lead-out area, and the zones 0-23 that 
are shown in Fig. 3A. 

[0037] The lead-in area and lead-out area each 
include a defect management area (DMA). This defect 
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management area records position information showing 
the positions of sectors that include defects and 
replacement position information showing whether the 
sectors used for replacing defective sectors are present 
in any of the replacement areas. 
[0038] Each zone has a user area, in addition to a 
replacement area and an unused area that are provided 
at the boundary with the next zone. A user area is an 
area that the file system can use as a recording area. 
The replacement area is used to replace defective sec- 
tors when such defective sectors are found. The unused 
area is an area that is not used for recording data. Each 
unused area only includes two tracks and is provided to 
prevent mistaken identification of sector addresses. The 
reason for this is that while sector addresses are 
recorded at a same position in adjacent tracks within the 
same zone, for Z-CLV the recording positions of sector 
addresses are different for adjacent tracks at the bound- 
aries between zones. 

[0039] In this way, sectors that are not used for data 
recording exist at the boundaries between zones. On a 
DVD-RAM, logical sector numbers (LSN) are consecu- 
tively assigned to physical sectors of the user area in 
order starting from the inner periphery. These LSN 
show only the sectors used for recording data. As 
shown in Fig. 3C, the area that records user data and 
includes sectors that have been assigned an LSN is 
called the volume area. 

(1-2) Data Recorded in the Volume Area 

[0040] Fig. 4 A shows the content of the data recorded 
in the volume area of a DVD-RAM. 
[0041 ] The volume area is used for recording AV files 
that are each composed of a plurality of VOBs and an 
RTRW (RealTime Rewritable) management file that is 
the management information for the AV files. 
[0042] The fifth (lowest) level in Fig. 4A shows the 
video stream and audio stream. These streams are 
divided into the pay load size of a packet, as shown on 
the fourth level. Data divisions produced by this division 
are stored in video packs and audio packs according to 
MPEG standard. These packs are multiplexed into the 
video objects VOB #1, VOB #2 in the AV file shown on 
the third level. The AV file is divided into a plurality of 
extents according to ISO/IEC 13346, as shown on the 
second level. These extents are each recorded in an 
unused area in a zone area in the volume area, as 
shown on the top level. Note that none of the extents 
crosses a zone boundary. 

[0043] These AV files and RTRW management files 
are managed using directory files and file entries that 
have been standardized under ISO/IEC 13346. For the 
example shown in Fig. 4A, the AV file that stores VOB 
#1 , VOB #2, and VOB #3 is divided into the extents A, B. 
C, and D. These extents are stored in zone areas, so 
that the file entry for an AV file includes allocation 
descriptors for the extents A, B, C, and D. The extents 



produced by dividing an AV file are called AV blocks. 
Each AV block has a data size that ensures that a data 
.underflow will not occur in a buff en.calleda track buffer, 
provided for disc access in a recording apparatus. 

5 [0044] Fig. 4B shows an example data structure of a 
file entry. In Fig. 4B, a file entry includes a descriptor 
tag, an ICB tag, an allocation descriptor length, 
expanded attributes, and allocation descriptors corre- 
sponding to each of extents A, B, C, and D. 

10 [0045] The descriptor tag is a tag showing that the 
present entry is a file entry. For a DVD-RAM, a variety of 
tags are used, such as the file entry descriptor and the 
space bitmap descriptor. For a file entry, a value "261 " is 
used as the descriptor tag indicating a file, entry. 

is [0046] The ICB tag shows attribute information for the 
file entry itself. 

[0047] The expanded attributes are information show- 
ing the attributes with a higher-level content than the 
content specified by the attribute information field in the 
20 file entry. 

[0048] The data construction of an allocation descrip- 
tor is shown on the right-hand side of Rg. 4B. Each allo- 
cation descriptor includes an extent length and a logical 
block number that shows the recording start position of 
25 the extent. The logical sectors on a DVD- RAM that are 
occupied by an extent are managed as "used", while 
logical sectors that are not occupied by a valid extent 
are managed as "unused". 

[0049] On the other hand, information relating to VOB 
30 #1 to VOB #3 is recorded in the RTRW management file 
as the VOB #1 information, the VOB #2 information, and 
the VOB #3 information, as shown on the sixth level of 
Fig. 4A. Like the AV files, the RTRW management file is 
divided into a plurality of extents that are recorded in the 
35 volume area. 

h-2-H Video Stream 

[0050] The video stream shown in Fig. 5 is an 

40 arrangement of a plurality of sets of picture data that 
each correspond to one frame of video images. This 
picture data is a video signal according to NTSC 
(National Television Standards Committee) or PAL 
(Phase-Alternation Line) standard that has been com- 

45 pressed using MPEG techniques. Sets of picture data 
produced by compressing a video signal under NTSC 
standard are displayed by video frames that have a 
frame interval of around 33msec (1/29.97 seconds to be 
precise). Sets of picture data produced by compressing 

so a video signal under PAL standard are displayed by 
video frames that have a frame interval of 40msec. The 
top level of Fig. 5 shows examples of video frames. In 
Fig. 5, the sections indicated between the V and 
symbols are video frames, with the symbol showing 

55 the presentation start time (Presentation_Start_Time) 
for each video frame and the V symbol showing the 
presentation end time (Presentation_ErxM"ime). This 
notation for video frames is also used in the following 
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drawings. The sections enclosed by these symbols 
each include a plurality of video fields. 
[0051] Compression according to MPEG standards 
uses the spatial frequency characteristics within the 
image of one frame and the time-related correlation with 
images that are displayed before or after the frame. 
Each set of picture data is converted into one of a Bidi- 
rectionally Predicative (B) Picture, a Predicative (P) Pic- 
ture, or an Intra (I) Picture. Fig. 5 shows B pictures, P 
pictures, and I pictures as all having the same size, 
although there is in fact great variation in their sizes. 
[0052] When decoding B pictures or P pictures that 
use the time-related correlation between frames, it is 
necessary to refer to the images that are reproduced 
before or after the picture being decoded. For example, 
all images referred to by a B picture need to be com- 
pletely decoded before the decoding of the B picture 
can be performed. 

[0053] As a result, an MPEG video stream defines the 
coding order of pictures in addition to defining the dis- 
play order of the pictures. In Fig. 5, the second and third 
levels respectively show the sets of picture data 
arranged in display order and in coding order. 
[0054] When a sequence of only B pictures and P pic- 
tures is used, problems can be caused by special repro- 
duction features that perform decoding starting midway 
through the video stream. To prevent such problems, an 
I picture is inserted into the video data at 0.5s intervals. 
Each sequence of picture data starting from an I picture 
and continuing as far as the next I picture is a GOP 
(Group Of Pictures). Such GOPs are defined as the unit 
for MPEG compression. On the third level of Fig. 5, the 
dotted vertical line shows the boundary between the 
present GOP and the following GOP. In each GOP, the 
picture type of the last picture data in the display order 
is usually a P picture, while the picture type of the first 
picture data in the coding order is always an I picture. 

1-2-2 Data Structure of VOBs 

[0055] The VOBs (Video Objects) #1 , #2, #3 ... shown 
in Fig. 4A are program streams under ISO/IEC 13818-1 
that are obtained by multiplexing a video stream and 
audio stream. VOBs do not have a program_end_code 
at the end. 

[0056] Fig. 6A shows the detailed hierarchy for the 
logical construction of VOBs. This means that the logi- 
cal format located on the top level of Fig. 6A is shown in 
more detail in the lower levels. 
[0057] The video stream that is located on the top 
level in Fig. 6A is shown divided into a plurality of GOPs 
on the second level. These GOPs are the same as in 
Fig. 5, so that the picture data in GOP units has been 
converted into packs. The audio stream shown on the 
right of the top level in Fig. 6A is converted into packs on 
the third level, in the same way as in Fig. 5. The divided 
picture data for a GOP unit is multiplexed with the audio 
stream that has been divided in the same way. This pro- 



duces the pack sequence on the/fourth level of Fig. 6A. 
This pack sequence forms a plurality of VOBUs (Video 
Object Units) that are shown on.thefrfth level. The VOBs 
(Video Objects) shown on the sixth level are composed 

5 of a plurality of these VOBUs arranged in a time series. 
In Fig. 6A, the broken guidelines show the relations 
between the data in the data structures on adjacent lev- 
els. From the guidelines in Fig. 6 A, it can be seen that 
the VOBUs on the fifth level correspond to the pack 

to sequence on the fourth level and the picture data in 
GOP units on the second level. 
[0058] As can be seen by tracing the guidelines, each 
VOBU is a unit that includes at least one GOP that has 
picture data with a reproduction period of around 0.4 to 

15 1 .0 second, as well as audio frame data that a recording 
apparatus should read from the DVD-RAM at the same 
time as this picture data. The unit called a GOP is 
defined under MPEG Video Standard (ISO/IEC 13818- 
2). Since a GOP only specifies picture data, as shown 

20 on the second level of Fig. 6A, the audio data and other 
data (such as sub-picture data and control data) that are 
multiplexed with this picture data are not part of the 
GOP. Under DVD-RAM standard, the expression 
"VOBU" is used for a unit that corresponds to a GOP, 

25 and is the general name for at least one GOP including 
picture data with a reproduction period of around 0.4 to 
1.0 second and the audio data that has been multi- 
plexed with this picture data. 

[0059] The arrangement of video packs and audio 

30 packs ina VOBU is recorded as it is as a sequence of 
logical sectors on a DVD-RAM. Accordingly, the data 
stored in these packs will be read from the DVD -RAM in 
this order. This means that this arrangement of video 
packs and audio packs is the order in which the data 

35 inside the packs is read from a DVD-RAM. Each video 
pack has a storage capacity of around 2KB. Since the 
data size of the video stream in one VOBU can be sev- 
eral hundred kilobytes, the video stream will be divided 
into several hundred video packs. 

40 [0060] The following is an explanation of how a 
recording apparatus identifies the start of a VOBU. In 
Fig. 6A, a system header hi is given, with the arrows 
that extend from this system header indicating the video 
packs located at the start of each VOBU. This system 

45 header includes a variety of parameters that are 
required when decoding streams. The arrows show that 
a system header is stored in the first pack in each 
VOBU. These system headers act as separators 
between VOBUs in the data sequence. 

50 

(1-2-2-1) Data Construction of the Audio Packs 

[0061] Fig. 6B shows the logical format of a video 
pack arranged at the start of a VOBU. As shown in Fig. 
55 6B, the first video pack in a VOBU is composed of a 
pack header, a system header, a packet header, and 
video data that is part of the video stream. 
[0062] Fig. 6C shows the logical format of the video 
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packs thai do not come first in the VOBU. As shown in 
Fig. 6C, these video packs are each composed of a 
pack header, a packet header, and video data, with no 
system header 

[0063] Fig. 6D shows the logical format of the system 
header. The system header shown in Fig. 6D is only 
appended to the video pack that is located at the start of 
a VOBU. This system header includes maximum rate 
information (shown as the "Rate.bound.info" in Fig. 6D) 
and buffer size information (shown as 
"Buffer.bound.info"). The maximum rate information 
shows the transfer rate to be requested of the reproduc- 
tion apparatus when inputting the data. The buffer size 
information (shown as "Buffer.bound.info" in Fig. 6D) 
shows the highest buffer size to be requested of the 
reproduction apparatus when inputting the data in the 
VOB. 

[0064] The following is a description of the data con- 
struction of each pack. Note that the data construction 
of video packs is not part of the gist of the present inven- 
tion. Accordingly, only the data construction of audio 
packs will be explained. 

[0065] Fig. 7A shows the logical format of an audio 
pack for Doiby-AC3 format. As shown in Fig. 7A, each 
audio pack includes a pack header, a packet header, a 
sub_stream_id showing whether the compression tech- 
nique for the audio stream in this pack is Linear-PCM or 
Dolby- AC3, audio frame information, and a plurality of 
sets of audio frame data that have been compressed 
using the compression technique indicated by the 
sub_stream_id. 

[0066] Fig. 7B shows the logical format of an audio 
pack for Linear-PCM methods. As shown in Fig. 7B, 
each Linear-PCM audio pack has the same elements as 
a Dolby-AC3 audio pack with the addition of audio frame 
data information. This audio frame data information 
includes the following: 

1. an audio_emphasis_flag showing whether 
emphasis is on or off; 

2. an audio_mute_flag showing whether an audio 
mute is on or off; 

3. an audio_frame_number for writing a frame 
number of the audio frame that is the first audio 
frame in the pack in an audio frame group (GOF); 

4. a quantization_word_length showing the word 
length when an audio frame sample has been 
quantized; 

5. an audio_sampleJength showing the audio sam- 
pling frequency; 

6. a number_of_audio_channels that may be set at 
monaural, stereo, and dual monaural; and 

7. a dynamic_range_control that compresses the 
dynamic_range starting from the first access unit. 

[0067] Fig. 7C shows the logical format of audio packs 
under MPEG-Audio methods. As shown in Fig. 7C, 
each pack of MPEG-Audio has the same elements as 



the packs in Dotby-AC3, but with no sub_stream_id or 
audio frame data information. 

[0068] Fig. 7D shows, the. logical format .of a pack 
header, a packet header, and the audio frame informa- 
s tion. 

[0069] The pack header shown in Fig. 7D includes a 
Pack_Start_Code, an SCR (System Clock Reference), 
and a Program__mux_rate. Of these, the SCR shows the 
time at which the audio frame data in the present pack 

to should be inputted into the decoder buffer (hereinafter, 
the "audio decoder buffer") provided for the audio 
stream. In a VOB, the first SCR is the initial value of the 
STC (System Time Clock) that is provided as a stand- 
ard feature in a decoder under MPEG standard. 

is [0070] As shown in Fig. 7D, the packet header 
includes a "packet_start__code_prefix" that is the first 
code in a packet, a "streamJD" that is set at the fixed 
value for a private stream, and a PTS (Presentation 
Time Stamp) that shows at what time the audio frame 

20 data should be outputted. 

[0071] The audio frame data information includes the 
"number_of_frame_headers" that gives the number of 
audio frames in the present audio pack and the 
^irst_access_pointer' , that gives the relative number of 

25 blocks between this audio frame data information and 
the first byte in the first access unit (audio frame). 

M -2-2-2^ Buffer State of the Audio Decoder Buffer 

30 [0072] The following is an explanation of the changes 
in the internal state of the audio decoder buffer when a 
PTS or SCR is assigned to a pack header or packet 
header. 

[0073] Fig. 8 is a graph showing the buffer state of the 
35 audio decoder buffer. In this figure, the vertical axis rep- 
resents buffer occupancy and the horizontal axis repre- 
sents time. 

[0074] The gradient of the inclined sections k1 1 , k12, 
and k13 in Fig. 8 represents the transfer rate of an audio 

40 pack. This transfer rate is the same for each audio pack. 
The respective heights of the inclined sections k11 t k12 t 
and k13 show the amount of audio frame data that is 
transferred to the audio decoder buffer by each audio 
pack. Overall, the payload of each audio pack will be 

45 filled with audio frame data, so that the height of each of 
the inclined sections k11, k12, and k13 is 2,016 bytes. 
[0075] The respective widths of the inclined sections 
k1 1 , k12, and k13 show the transfer period of one pack, 
while the respective start positions of the inclined sec- 

so tions k1 1, k12. and k13 in the horizontal axis show the 
SCR assigned to each pack. 

[0076] For the example of Dolby-AC3, the transfer rate 
to the audio decoder buffer is 384Kbps for two audio 
streams and 192Kbps for one audio stream. The pay- 
55 load size of each pack is 2,016 bytes, so that the trans- 
fer period for one pack is 2msec (=2,016 
bytes*8/8Mbps). This means that the transfer of the 
2,016 bytes of audio frame data in the payload of one 
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pack is completed in around 0.0625 (=2msec/32msec) 
times the reproduction period of the pack. 
[0077] The stepped parts d1, d2, and d3 show the 
reductions in buffer occupancy of the audio decoder 
buffer due to the outputting and decoding of accumu- 
lated audio frame data at the respective presentation 
start times of audio frames represented by the audio 
frame data. The positions of the stepped parts d1, d2, 
and d3 in the horizontal axis show the PTS assigned to 
each pack. 

[0078] The audio pack A31 shown in Fig. 8 stores the 
audio frame data A21, A22, and A23 that should be 
decoded at the presentation end times of the audio 
frames f20, f21, and f22. Of these sets of audio frame 
data, the audio frame data A21 is decoded at the pres- 
entation start time of the audio frame f21, before the 
audio frame data A22 and A23 are respectively 
decoded at the presentation start times of the audio 
frames f22 and f23. 

[0079] Of the audio frames stored in the audio pack 
A31 , the audio frame data A21 is the first to be decoded. 
This audio frame data should be decoded at the presen- 
tation start time of the audio frame f21 , so that the audio 
pack A31 needs to be read from the DVD-RAM by the 
end of the presentation period of the audio frame f20. 
Consequently, the audio pack A31 that includes the 
audio frame data A21, A22, and A23 is given an SCR 
that shows an input time that precedes the presentation 
start time of the audio frame f21 . 

(1-2-2-3) Buffer State for the Video Stream 

[0080] The following is an explanation of the changes 
in the internal state of a decode buffer provided for video 
streams (hereinafter, the "video buffer") due to the 
assigning of the time stamps PTS, DTS, and SCR in 
pack headers and packet headers. 
[0081 ] Video streams are encoded with variable code 
length due to the large differences in code size between 
the different types of pictures (I pictures, P pictures, and 
B pictures) used in compression methods that use time- 
related correlation. Video streams also include a large 
amounts of data, so that it is difficult to complete the 
transfer of the picture data to be reproduced, especially 
the picture data for an I picture, between the decoding 
time of the video frame that was decoded immediately 
before and the decoding start time of this I picture, 
which is to say during the reproduction period of one 
video frame. 

[0082] Fig. 9A is a graph showing video frames and 
the occupancy of the video decoder buffer. In Fig. 9A, 
the vertical axis represents the occupancy of the video 
decoder buffer, white the horizontal axis represents 
time. This horizontal axis is split into 33msec sections 
that each match the reproduction period of a video 
frame under NTSC standard. By referring to this graph, 
it can be seen that the occupancy of the video decoder 
buffer changes over time to exhibit a sawtooth pattern. 



[0083] The height of each triangular tooth that com- 
poses the sawtooth pattern represents the amount of 
data in the part of the video. stream to.be reproduced in 
each video frame. As mentioned before, the amount of 

5 data in each video frame is not equal, since the amount 
of code for each video frame is dynamically assigned 
according to the complexity of the frame. 
[0084] The gradient of each triangular tooth shows the 
transfer rate of the video stream. The approximate 

w transfer rate of the video stream is calculated by sub- 
tracting the output rate of the audio stream from the out- 
put rate of the track buffer. This transfer rate is the same 
during each frame period. 

[0085] During the period corresponding to one trian- 

15 gular tooth in Fig. 9A, picture data is accumulated with 
a constant transfer rate. At the decode time, the picture 
data for the present frame is instantly outputted from the 
video decoder buffer. The reason a sawtooth pattern is 
achieved is that the processing from the storage in the 

20 video decoder buffer to output from the video decoder 
buffer is continuously repeated. The DTS given to each 
video pack shows the time at which the video data 
should be outputted from the video decoder buffer. 
[0086] As shown in Fig. 9A, to maintain the image 

25 quality of complicated images, larger amounts of code 
need to be assigned to frames. When a larger amount 
of code is assigned to a frame, this means that the pre- 
storage of data in the video decoder buffer needs to be 
commenced well before the decode time. 

30 [0087] - Normally, the period Jrom the transfer start 
time, at which the transfer of picture data into the video 
decoder buffer is commenced, to the decode time for 
the picture data is called the VBV (Video Buffer Verify) 
delay. In general, the more complex the image, the 

35 larger the amount of assigned code and the longer the 
VBV delay. 

[0088] As can be seen from Fig. 9A, the transfer of the 
picture data that is decoded at the decode time T16 
starts at time T1 1 . The transfer of picture data that is 
40 decoded at the decode time T18, meanwhile, starts at 
time T12. The transfer of the other sets of picture data 
that are decoded at times T14, T15, T17, T19, T20, and 
T21 can similarly be seen to start before these decode 
times. 

45 

(1-2-2-4) Transfer Period of Each Set Of Picture Data 

[0089] Fig. 9B shows the transler of sets of picture 
data in more detail. When considering the situation in 

so Fig. 9A, the transfer of the picture data to be decoded at 
time T24 in Fig. 9B needs to be completed in the 
Tf_Period" between ..the start time T23 of the "VBV 
delay" and the start of the transfer of the next picture 
data to be reproduced. The increase in the occupancy 

55 of the buffer that occurs from this Tf_Period onwards is 
due to the transfer of the following picture data. 
[0090] The start time of the Tf_Period approximately 
equates to the SCR given in the first pack out of the 
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packs that store divisions of the corresponding picture 
data. The end time of the Tf_Period approximately 
equates the SCR given to the first pack out of the packs 
that store divisions of the next picture data. This means 
that a Tf_Period is defined by the SCRs assigned to 
video packs. 

[0091] The picture data accumulated in the video 
decoder buffer waits until the time T24 at which the pic- 
ture data is to be decoded. At the decode time T24, the 
image A is decoded, which clears part of the picture 
data stored in the video decoder buffer, and thereby 
reduces the total occupancy of the video decoder buffer. 
[0092] When considering the above situation, it can be 
seen that while it is sufficient for the transfer of audio 
frame data to start one frame in advance, the transfer of 
picture data needs to start well before the decode time 
of such picture data. In other words, the transfer of pic- 
ture data should start well before the transfer of audio 
frame data that is decoded at approximately the same 
time. Putting this another way, when the audio stream 
and video stream are multiplexed into an MPEG stream, 
audio frame data is multiplexed with picture data that 
has a later decode time. As a result, the picture data 
and audio frame data in a VOBU are in fact composed 
of audio frame data and picture data that will be 
decoded after the audio frame data. 

(1-2-2-5) Arrangement of Video Data and Audio Frame 
Data in Each Pack 

[0093] Fig. 1 0 shows how audio packs that store a plu- 
rality of sets of audio frame data and video packs that 
store a plurality of sets of picture data may be arranged. 
In Fig. 10, audio pack A31 stores the sets of audio 
frame data A21 , A22, and A23 that are to be reproduced 
for f21 , f22, and f23. Of the sets of audio frame data in 
the audio pack A31, the first audio frame data to be 
decoded is the audio frame data A21. Since the audio 
frame data A21 needs to be decoded at the presenta- 
tion end time of the audio frame f20 t this audio data A21 
needs to be multiplexed with the picture data V1 1 that is 
transferred during the same period (period k11) as the 
audio frame f20. As a result, the audio pack A31 is 
arranged near the video packs that store the picture 
data V1 1 , as shown at the bottom of Fig. 10. 
[0094] The audio pack A32 storing the sets of audio 
frame data A24, A25, and A26 that are respectively 
reproduced for f24, f25, and f26 should be multiplexed 
with the picture data V15 that is transferred at the same 
time (period k15) as the audio frame f23. As a result, the 
audio pack A32 is arranged near the video packs that 
store the picture data V15, as shown at the bottom of 
Fig. 10. 

(1-2-2-6) Arrangement of Packs Near a VOBU Bound- 
ary 

[0095] Since a VOBU is a data unit that includes one 



GOP, it can be understood that VOBU boundaries are 
determined based on GOP boundaries. When this is the 
case, a first problem is the amount of audio frame data 
stored in one VOBU. As shown in Fig. 10, the audio 

s packs that store sets of audio frame data are arranged 
so as to be near video packs that store picture data that 
is reproduced sometime after the audio frame data. This 
means that the audio frame data that should be inputted 
into the decoder buffer at the same time as a GOP is 

to stored in a same VOBU as the GOP. 

[0096] A second problem is how to align the bounda- 
ries of sets of audio frame data with the boundaries of 
VOBUs since VOBUs are fundamentally determined 
based on GOPs. As stated earlier, each set of picture 

is data is compressed using variable length encoding, so 
that GOPs have different sizes. Because of this, the 
number of audio packs that will be inputted into the 
decoder buffer at approximately the same time as the 
video packs for a GOP will vary between VOBUs. As 

20 result in a VOB, some VOBUs have a payload for audio 
packs whose total size corresponds to an integer 
number of audio packs, while other VOBUs have a pay- 
load for audio packs whose total size corresponds to a 
non-integer number of audio packs. Ignoring differences 

25 in the number of audio packs, to align the boundaries of 
VOBUs with boundaries between sets of audio frame 
data, the arrangement of packs near the boundaries of 
video object units will differ between cases where the 
total size of the payload for audio packs corresponds to 

30 an integer number of sets of audio frame data and a 
case where the total size corresponds to a non-integer 
number of sets of audio frame data. 
[0097] Fig. 1 1 shows how each set of audio frame 
data is stored into each pack when the total size of the 

35 payload for audio packs in a VOBU is an integer number 
of sets of audio frame data. 

[0098] The boxes drawn on the top level of Fig. 1 1 
show B pictures. P pictures, and I pictures included in 
the video stream. The second level shows the division 

40 of the video stream on the top level into units with the 
same size as the payioads of packs. The arrows that 
extend downward from the second level show how the 
data divisions obtained by the division into payload size 
are stored in the video packs. 

45 [0099] The example waveform shown on the fifth level 
in Fig. 1 1 shows an audio wave obtained by sampling at 
a sampling frequency of 48KHz. The fourth level shows 
a sequence of sets of audio frame data. The sampled 
data obtained through the sampling is divided into 1536 

so (=32msec/(1/48kHz)) groups to form audio access units 
(AAU). These AAUs are encoded to produce the sets of 
audio frame data shown on the. fourth level. The corre- 
spondence between the sampled data and the sets of 
audio frame data is shown by the dotted lines that 

55 extend upward from the fifth level. Meanwhile, the dot- 
ted lines that extend upward from the fourth level show 
the storage of the sets of audio frame data into audio 
packs. 
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[0100] The vertical line on the top level that shows the 
boundaries between the B picture v15 and the I picture 
v16 is a boundary between GOPs. The video pack that 
includes the picture data located immediately before 
this GOP boundary is shown as the video pack P31 . 
[0101] The audio pack P32 located immediately 
before this video pack P31 is indicated by the arrows 
that extend from the sets of audio frame data y-1, y-2, 
and the latter part of y-3, showing that this pack stores 
these sets of audio frame data. Meanwhile, the audio 
pack P35 that is located before this audio pack P32 is 
indicated by the arrows that extend from the sets of 
audio frame data y-5, y-4, and the former part of y-3, 
showing that this pack stores these sets of audio frame 
data. 

[01 02] Fig. 1 2 shows how sets of audio frame data are 
stored in each pack when the total size of the payload 
for audio packs included in a VOBU does not corre- 
spond to an integer number of sets of audio frame data. 
[0103] The top level and second level in Fig. 12 are the 
same as in Fig. 1 1 . The third level differs from that in 
Fig. 1 1 in that the audio pack P33 is located immedi- 
ately after the video pack P31. The correspondence 
between the sets of audio frame data shown on the 
fourth level and pack sequence shown on the third level 
is also different from that shown in Fig. 1 1 . 
[01 04] The audio pack P32 that is located immediately 
before this video pack P31 is indicated by the arrows 
that extend from the sets of audio frame data x-3, x-2, 
and the former part of x-1 , showing that this pack stores 
these sets of audio frame data. Meanwhile, the audio 
pack P33 that is located immediately after this video 
pack P31 is indicated by the arrows that extend from the 
latter part of the set of audio frame data x-1, showing 
that this pack stores this audio frame data. Since only 
the latter part of the audio frame data x-1 is stored, an 
area in the payload of the audio pack P33 is left unused. 
To fill this remaining area, the padding packet P51 is 
inserted into the audio pack P33. 
[01 05] Since the latter part of the audio irame data x- 
1 and the padding packet are arranged into the audio 
pack P33, the boundary of the VOBUs matches the 
boundaries between the sets of audio frame data. 
[01 06] In this way, it is ensured that the boundaries of 
VOBUs match a boundary between the sets of audio 
frame data, regardless of whether the total payload size 
of the audio packs included in the VOBUs corresponds 
to an integer number of sets of audio frame data or a 
non-integer number. This means that if partial delete 
operations are performed with a VOBU as the smallest 
deletable unit of data, the boundary between the 
deleted data and the remaining data will match a bound- 
ary between sets of audio frame data. 

(1-2-2-6-1) Selection of Logical Format Based on Free 
Size of Audio Packs 

[0107] For the example shown in Fig. 12, a padding 
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packet P51 is inserted into the free area in the pack, 
though depending on the size of the free area in the 
payload. a padding packet P51 may be inserted into the 
pack, or stuffing bytes may be inserted into the packet 
s header. Figs. 13A and 13B respectively show examples 
of packs where a padding packet and stuffing bytes 
have been inserted. 

[01 08] When the remaining area in a pack is between 
one to seven bytes in size, stuffing bytes are inserted 

w into the packet header, as shown in Fig. 13A. However, 
when the remaining area in the pack is at least eight 
bytes in size, a padding packet is inserted into the pack 
alongside the audio packet, as shown in Fig. 13B. The 
inserted padding packet has a unique header. A demul- 

15 tiplexer provided in a recording apparatus to separate 
the multiplexed video and audio data refers to this 
header and discards the data from the header onwards 
as invalid data. This means that invalid data is not accu- 
mulated in the audio decoder buffer when a padding 

20 packet is provided in an audio pack, with this data 
merely filling the free area in the payload. 

(1-3) Composition of the RTRW Management File 

25 [01 09] The following is a description of the composi- 
tion of the RTRW management file. The content of the 
RTRW management file can be roughly divided into the 
VOB table and the PGC table. A VOB is a physical unit 
for indicating the MPEG stream recorded on an optical 

30 .disc. On the. other hand, a PGC (Program Chain) is a_ 
logical unit that indicates an arrangement of all or only 
some of the data divisions in a VOB. PGCs define repro- 
duction sequences. In Fig. 14, more than four sets of 
PGC information numbered PGC information #1, PGC 

35 information #2, PGC information #3, PGC information 
#4 ... are present for the three VOBs, VOB #1, VOB #2, 
and VOB #3. This shows that four or more PGCs can be 
logically defined for three VOBs that physically exist. 
[0110] Fig. 14 shows the detailed hierarchical struc- 

40 ture in which data is stored in the RTRW management 
file. The logical format shown on the right of Fig. 14 is a 
detailed expansion of the data shown on the left, with 
the broken lines serving as guidelines to clarify which 
parts of the data structure are being expanded. 

45 [0111] From the data structure in Fig. 14, it can be 
seen that the RTRW management file includes a 
Number_of_VOBIs (showing the number of sets of VOB 
information) and VOB information for VOB #1, VOB #2, 
and VOB #3. This VOB information for each VOB 

so includes VOB general information, VOB stream informa- 
tion, and a time map table. 

(1-3-1) Composition of the VOB General Information 

55 [01 1 2] The VOB general information includes a VOB- 
ID that is uniquely assigned to each VOB in an AV file 
and VOB recording time information of each VOB. 
[01 1 3] The VOB attribute information is composed of 



21 



EP 0 926 903 A1 



22 



video attribute information and audio attribute informa- 
tion. 

[0114] The video attribute information includes video 
compression mode information that indicates one of 
MPEG2 and MPEG1, TV system information that indi- 
cates one of NTSC and PAL/SECAM, aspect ratio infor- 
mation showing "4:3" or "16:9", video resolution 
information showing "720x480" or "352x240" when the 
video attribute information indicates NTSC, and copy- 
guard information showing the presence/absence of 
copy prevention control for a video tape recorder. 
[0115] The audio attribute information shows the 
encoding method that may be one of MPEG. Dolby- 
AC3, or Unear-PCM. the sampling frequency (such as 
48kHz), and an audio bitrate that is written as a bitrate 
when a fixed bitrate is used or as the legend "VBR" 
when a variable bitrate is used. 
[0116] The time map table shows the presentation 
start time of each VOBU and the address of each VOBU 
relative to the start of the AV tile. 

M-3-2^ Composition of the PGC Table 

[0117] The PGC table includes a Number_of_PGCIs 
(showing the number of sets of PGC information) and a 
plurality of sets of PGC information. Each set of PGC 
information includes a Number_of_Cellls, showing the 
number of sets of cell information, and a set of cell infor- 
mation for each cell. Each set of cell information 
includes a VOBJD, a C_V_S_PTM, and a 
C_V_E_PTM. 

. [01 1 8] The VOBJD is a column for entering the iden- 
tifier of a VOB included in the AV file. When there are a 
plurality of VOBs in the AV file corresponding to a set of 
cell information, this VOBJD clearly shows which of the 
VOBs corresponds to this cell information. 
[01 1 9] The cell start time C_V__S_PTM (abbreviated to 
C__V_S_PTM in the drawings) is information showing 
the start of the data division that is logically indicated by 
this cell information. In detail, this indicates the video 
field located at the start of the data division. 
[01 20] The cell end time C_V_E_PTM (abbreviated to 
C_V_E_PTM in the drawings) is information showing 
the end of the data division that is logically indicated by 
this cell information. In detail, this indicates the video 
field located at the end of the data division. 
[0121] The sets of time information given as the cell 
start time C_V_S_PTM and the cell end time 
C_V_E_PTM show the start time for the encoding oper- 
ation by a video encoder and the end time for the encod- 
ing operation, and so indicate a series of images 
marked by the user. As one example, when the user 
marks the images shown in Fig. 15, the C_V_S_PTM 
and C_V_E_PTM in the cell information are set to indi- 
cate the marked video fields with a high degree of pre- 
cision. 



(1 -3-2-1) Reproduction Using the Logical Units (PGCs) 

[01 22] The following is an explanation of the reproduc- 
tion of PGCs. Fig. 16 shows how VOBs are accessed 

s using PGCs. The dotted arrows in Fig. 1 6 show the cor- 
respondence between the referring and referred-to 
data. The arrows y2, y4, y6, and y8 show the corre- 
spondence between each VOBU in a VOB and the time 
codes included in the time map table in the set of VOB 

10 information. The arrows y1 , y3, y5, and y7 show the cor- 
respondence between the time codes included in the 
time map table in the set of VOB information and sets of 
cell information. 

[01 23] Here, suppose the user indicates reproduction 

is for one of the PGCs. When the indicated PGC is PGC 
#2, the recording apparatus extracts the cell information 
#1 (abbreviated to Celll #1) located at the front of PGC 
#2. Next, the recording apparatus refers to the AV file 
and VOB identifier included in the extracted Celll #1, 
20 and so finds that the AV file and VOB corresponding to 
this cell information are AV file #1 and VOB #1 , with time 
map table #1 being specified for this VOB. 
[0124] Since the address relative to the start of the 
VOB and the elapsed time are written in the specified 

25 time map table #1 , the recording apparatus refers to the 
time map table #1 using the cell start time C_V_S_PTM 
as shown by arrow y1 and so finds the VOBU in the AV 
file that corresponds to the cell start time C_V_S_PTM 
included in cell information #1 and the start address of 

30 this VOBU. Once the start address of the VOBU corre- 
sponding to the ceil start time C_V_S__PTM is known, 
the recording apparatus accesses VOB #1 as shown by 
arrow y2 and starts to read the VOBU sequence starting 
from VOBU #1 that is indicated by this start address. 

35 [0125] Here, since the cell end time C_V_E_PTM is 
included in cell information #1 along with the cell start 
time C„V_S_PTM, the recording apparatus refers to the 
time map table #1 using the cell end time C_V_E_PTM. 
as shown by the dotted arrow y3. As a result, the record- 

40 ing apparatus can find out which VOBU in the AV file 
corresponds to the cell end time C_V_E_PTM included 
in cell information #1 and can obtain the end address of 
this VOBU. Supposing that the VOBU indicated in this 
way is VOBU #i, the recording apparatus will read the 

45 VOBU sequence as far as the end of the VOBU #i that 
is indicated by the arrow y4 in Fig. 16. By accessing the 
AV file via the cell information #1 and the VOB informa- 
tion #1 in the way, the recording apparatus can read 
only the data in VOB #1 of AV file #1 that is specified by 

so ceil information #1. By repeating this selective reading 
of data using cell information #2 and cell information #3, 
the recording apparatus can read and reproduce all of 
the VOBUs included in VOB #1 . 
[0126] By performing reproduction based on sets of 

55 PGC information, the recording apparatus can repro- 
duce the data in a VOB according to the order in which 
it is indicated by the sets of PGC information. 
[0127] Partial reproduction of a PGC is also possible 
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by having the user indicate cells that are included in a 
PGC. Cells are parts of a VOB that are specified using 
time information for video fields, so that the user is able 
to view scenes that he/she has indicated very precisely. 
However, the user is not able to directly indicate the 
reproduction of a data division, such as a VOBU, that is 
smaller than one cell. 

(1 -3-2-2) Partial Deletion tor a PGC 

[01 28] The partial deletion of a VOB is performed with 
a VOBU as the minimum unit. This is because each 
VOBU includes (a) GOP(s) in the video stream and 
because the boundaries between VOBUs will definitely 
match boundaries between sets of audio frame data. 
The procedure when performing a partial deletion in the 
present embodiment is described below. 
[0129] In the following example, the PGC information 
#2 shown in Fig. 16 is composed of the cells #1 to #3, 
with cell #2 being subjected to a partial deletion. In Fig. 
17, the area that corresponds to the deleted cell is 
shown using diagonal shading. 
[0130] As shown within frame w1 1 in Fig. 17, cell #2 
that is to be deleted indicates one of the video frames, 
out of the plurality of sets of picture data included in 
VOBU #i+1, using the cell start time C_V_S_PTM. As 
shown within frame w12, cell #2 also indicates one of 
the video frames, out of the plurality of sets of picture 
data included in VOBU #j+1, using the cell end time 
C_V_E_PTM L 

[01 31 ] Fig. 1 8A shows the extents that are freed by a 
partial deletion using PGC information #2. As shown on 
the second level of Fig. 18A, VOBUs #i, #i+1, and #i+2 
are recorded in the extent #m, and VOBUs #j, #j+1 , and 
#j+2 are recorded in the extent #n. 
[0132] As shown in Fig. 18A, cell #2 indicates picture 
data included in VOBU #i+1 as the cell start time 
C_V_S_PTM and picture data included in VOBU #j+1 
as the cell end time C_V_E_PTM. This means that the 
area from the extent that VOBU #i+2 occupies to the 
extent that VOB #j occupies is freed to become an 
unused area. However, the extents that VOBU #i and 
VOBU #i+1 occupy and the extents that VOBU #j+1 and 
VOBU #j+2 occupy are not freed. 
[0133] Fig. 18B shows examples of the VOB, VOB 
information, and PGC information after the partial dele- 
tion described above. Since the part corresponding to 
the former cell #2 has been deleted, VOB #1 is now 
composed of the new pair of VOBU #1 and VOBU #2. 
[0134] The VOB information for VOB #1 is divided into 
VOB information #1 and VOB information #2. The time 
map tables that are included in these sets of VOB infor- 
mation are divided into time map table #1 and time map 
table #2. 

[0135] Figs. 19A and 19B show VOBU #i+1 and 
VOBU #i+2 before and after the partial deletion 
described above. Of these, Fig. 19A shows the state 
before the partial deletion and has the same content as 



Fig. 11. In Fig. 19B, the data from VOBU #i+2 onwards 
has been deleted. Since the boundary between VOBU 
#i+1 and VOBU #i+2 matched the boundary between 
the sets of audio frame data y-1 and y. the partial dele- 
5 tion of data from VOBU #i+2 onwards results in the 
audio frame data up to audio frame data y-1 being left 
and the audio frame data from audio frame data y 
onwards being deleted. 

[0136] Figs. 20A and 20B show VOBU #j and VOBU 
10 #j+l before and after the partial deletion described 
above. Of these, Fig. 20A shows the state before the 
partial deletion and has the same content as Fig. 12. In 
Fig. 20B, the data up to VOBU #j has been deleted. 
Since the boundary between VOBU #j and VOBU #j+1 
15 matched the boundary between the sets of audio frame 
data x-1 and x, the partial deletion of data up to VOBU 
#j results in the audio frame data up to audio frame data 
x-1 being deleted and the audio frame data from audio 
frame data x onwards being left. 
20 [01 37] Since the boundaries between VOBUs match 
the boundaries between sets of audio frame data, it can 
be seen that partial deletes that are performed in VOBU 
units have no danger of leaving only part of a set of 
audio frame data on the optical disc. 

25 

(2-1) System Construction of the Recording Apparatus 

[01 38] The recording apparatus of the present embod- 
iment has functions for both a DVD- RAM reproduction 

30 apparatus and a DVD-RAM recording apparatus. Fig. 
21 shows an example of the system construction that 
includes the recording apparatus of the present embod- 
iment. As shown in Fig. 21, this system includes a 
recording apparatus (hereinafter DVD recorder 70), a 

35 remote controller 71 , a TV monitor 72 that is connected 
to the DVD recorder 70, and an antenna 73. The DVD 
recorder 70 is conceived as a device to be used in place 
of a conventional video tape recorder for the recording 
of television broadcasts, but also features editing func- 

40 tions. Fig. 21 shows a system where the DVD recorder 
70 is used as a domestic video appliance. The DVD- 
RAM described above is used by the DVD recorder 70 
as the recording medium for recording television broad- 
casts. 

45 [0139] When a DVD-RAM is loaded into the DVD 
recorder 70, the DVD recorder 70 compresses a video 
signal received via the antenna 73 or a conventional 
NTSC signal and records the result onto the DVD-RAM 
as VOBs. The DVD recorder 70 also decompresses the 

50 video streams and audio streams included in the VOBs 
recorded on a DVD- RAM and outputs the resulting 
video signal or NTSC signal and audio signal to the TV 
monitor 72. 

55 (2-2) Hardware Construction of the DVD Recorder 70 

[0140] Fig. 22 is a block diagram showing the hard- 
ware construction of the DVD recorder 70. "Rie DVD 
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recorder 70 includes a control unit 1 , an MPEG encoder 
2, a disc access unit 3, an MPEG decoder 4, a video 
signal processing unit 5, a remote controller 71 , a bus 7, 
a remote control signal reception unit 8, and a receiver 
9. 

[0141] The arrows drawn with solid lines in Fig. 22 
show the physical connections that are achieved by the 
circuit wiring inside the DVD recorder 70. The broken 
lines, meanwhile, show the logical connections that indi- 
cate the input and output of various kinds of data on the 
connections shown with the solid lines during a video 
editing operation. 

[0142] The control unit 1 is the host-side control unit 
that includes the CPU 1a, the processor bus 1b, the bus 
interface 1c, the main storage 1d, and the ROM 1e. By 
executing programs stored in the ROM 1e, the control 
unit 1 records and reproduces VOBs. 
[0143] The MPEG encoder 2 operates as follows. 
When the receiver 9 receives an NTSC signal via the 
antenna 73, or when a video signal outputted by a 
domestic video camera is received via the video input 
terminals at the back of the DVD recorder 70, the MPEG 
encoder 2 encodes the NTSC signal or video signal to 
produce VOBs. The MPEG encoder 2 then outputs 
these VOBs to the disc access unit 3 via the bus 7. 
[0144] The disc access unit 3 includes a track buffer 
3a, an ECC processing unit 3b, and a drive mechanism 
3c for a DVD-RAM, and accesses the DVD-RAM in 
accordance with control by the control unit 1 . 
[0145] In more detail, when the control unit 1 gives an 
indication for recording on the DVD- RAM and the VOBs 
encoded by the MPEG encoder 2 have been succes- 
sively outputted as shown by the broken line (1), the 
disc access unit 3 stores the received VOBs in the track 
buffer 3a. After the ECC processing unit 3b performs 
ECC processing, the disc access unit 3 controls the 
drive mechanism 3c to successively record these VOBs 
onto the DVD-RAM. 

[0146] On the other hand, when the control unit 1 indi- 
cates a data read from a DVD -RAM, the disc access 
unit 3 controls the drive mechanism 3c to successively 
read VOBs from the DVD-RAM. After the ECC process- 
ing unit 3b performs ECC processing on these VOBs, 
the disc access unit 3 stores the result in the track buffer 
3a. 

[0147] The drive mechanism 3c mentioned here 
includes a platter for setting the DVD-RAM, a spindle 
motor for clamping and rotating the DVD- RAM, an opti- 
cal pickup for reading a signal from the DVD-RAM, and 
an actuator for the optical pickup. Reading and writing 
operations are achieved by controlling these compo- 
nents of the drive mechanism 3c, although such control 
does not form part of the gist of the present invention. 
Since this can be achieved using well-known methods, 
no further explanation will be given in this specification. 
[0148] The MPEG decoder 4 operates as follows. 
When VOBs that have been read from the DVD-RAM by 
the disc access unit 3 are outputted as shown by the 



broken line (2), the MPEG decoder 4 decodes these 
VOBs to obtain uncompressed digital video data and an 
audio signal. The MPEG decoder 4 outputs the uncom- 
pressed digital video data to the video signal processing 

5 unit 5 and outputs the audio signal to the TV monitor 72. 
[01 49] The video signal processing unit 5 converts the 
image data outputted by the MPEG decoder 4 into a 
video signal for the TV monitor 72. On receiving graph- 
ics data from outside, the video signal processing unit 5 

10 converts the graphics data into an image signal and per- 
forms signal processing to combine this image signal 
with the video signal. 

[0150] The remote control signal reception unit 8 
receives a remote controller signal and informs the cen- 
ts trol unit 1 of the key code in the signal so that the control 
unit 1 can perform control in accordance with user oper- 
ations of the remote controller 71. 

(2-2-1) Internal Construction of the MPEG Encoder 2 

20 

[0151] Fig. 23A is a block diagram showing the con- 
struction of the MPEG encoder 2. As shown in Fig. 23A, 
the MPEG encoder 2 is composed of a video encoder 
2a, a video encoding buffer 2b for storing the output of 

25 the video encoder 2a, an audio encoder 2c. an audio 
encoding buffer 2d, a system encoder 2e for multiplex- 
ing the encoded video stream in the video encoding 
buffer 2b and the encoded audio stream in the audio 
encoding buffer 2d, an STC (System Time Clock) unit 2f 

30 for generating the synchronization clock of the MPEG 
encoder 2, and the encoder control unit 2g for control- 
ling and managing these components of the MPEG 
encoder 2. Of these, the audio encoder 2c encodes 
audio information that is inputted from outside to gener- 

35 ate a plurality of sets of audio frame data that are the 
minimum data unit that can be independently decoded. 
The audio encoding buffer 2d stores the plurality of sets 
of audio frame data encoded by the audio encoder 2c in 
the order in which they were generated. 

40 

(2-2-2) Internal Construction of the System Encoder 2e 

[01 52] Fig. 23 B shows the internal construction of the 
system encoder 2e. As shown in Fig. 23B, the system 

45 encoder 2e includes an audio packing unit 15, a virtual 
decoder buffer 16, a virtual presentation time counting 
unit 17, a video packing unit 18, a virtual decoder buffer 
19, and an interleaving unit 20. The audio packing unit 
15 converts the sets of audio frame data stored in the 

so audio encoding buffer 2d into packs. The virtual 
decoder buffer 16 simulates the buffer state when the 
packs that store the sets of audio frame data are input- 
ted into a buffer. The virtual presentation time counting 
unit 17 measures time that is used for assigning an SCR 

55 and a PTS based on the synchronization clock of the 
STC 2f. The video packing unit 18 converts the video 
data stored in the video encoding buffer 2b into packs. 
The virtual decoder buffer 19 simulates the buffer state 
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when the packs that store the sets of video data are 
inputted into a buffer. The interleaving unit 20 generates 
VOBs by arranging the video packs and audio packs in 
accordance with the SCR and PTS assigned to the 
video packs and audio packs. In the present embodi- 
ment, the conversion of the audio frame data into packs 
by the audio packing unit 1 5 is the main focus, so that 
this is explained in detail. No detailed description of the 
generation of video packs by the video packing unit 18 
will be given. 

(2-2-2-1 ) Buffer Control bv the Audio Packing Unit 15 

[0153] The audio packing unit 15 extracts an amount 
of data equivalent to the payload size from the encoded 
audio frame data accumulated in the audio encoding 
buffer 2d. The audio packing unit 15 then generates a 
pack that stores the extracted data in its payload, and 
outputs the generated pack to the system encoder 2e. 
This generation of a pack involves the arrangement of 
data into a payload and the calculation of the input time 
of this pack into an audio decoder buffer. 
[01 54] The calculation of the input time of a pack into 
the audio decoder buffer is performed so that the buffer 
state of the audio decoder buffer can be efficiently con- 
trolled. In the model of a reproduction apparatus under 
DVD standard, the memory capacity of the audio 
decoder buffer is a mere 4KB, which equates to only 
twice the data size of the audio packs used as the unit 
when reading from a DVD- RAM; As a result, there is a 
risk of an overflow occurring in the audio decoder buffer 
if there are no restrictions regarding the input times of 
audio frame data or the number of sets of audio frame 
data inputted into the audio decoder buffer at any one 
time. However, if such restrictions are unsuitable, the 
opposite case can occur where the audio frame data 
that needs to be decoded is not present in the audio 
decoder buffer. This causes an underflow in the audio 
decoder buffer. 

[0155] To avoid underflows and overflows, the audio 
packing unit 15 uses the virtual decoder buffer 16 to 
simulate increases in the occupancy of the system 
encoder 2e of a decoder when packs are inputted and 
decreases in the occupancy as time passes. By doing 
so, the audio packing unit 15 calculates input times for 
audio packs so that no underflows or overflows occur in 
the audio decoder buffer. By giving packs SCRs that 
show input times calculated in this way, the audio pack- 
ing unit 15 ensures that overflows and underflows will 
not occur in the audio decoder buffer. When doing so, 
the audio packing unit 15 must not assign an SCR to an 
audio pack that corresponds to an SCR of a video pack. 
To ensure this happens, the audio packing unit 15 
informs the video packing unit 18 of the SCRs that have 
already been assigned to packs, and the video packing 
unit 18 assigns SCRs to video packs that do not corre- 
spond to the SCRs of audio packs. 
[0156] The simulation of the audio decoder buffer 



using the virtual decoder buffer 16 is performed by 
graphing the buffer state shown in Fig. 8 in the virtual 
decoder buffer 16. with the time measured by the virtual 
presentation time counting unit 17 as the horizontal 
s axis. 

[01 57] The audio packing unit 1 5 has the virtual pres- 
entation time counting unit 17 start to measure time. 
When the first pack accumulated in the audio encoder 
buffer 16 has been stored in the first pack, the audio 

10 packing unit 15 increases the buffer occupancy by the 
data amount for this first pack and plots an inclined part 
for the time measured by the virtual presentation time 
counting unit 17 based on the input bit rate of the pack 
[01 58] The virtual presentation time counting unit 1 7 

is continues to measure time and the audio packing unit 
15 plots a stepped part in the graph every time the time 
measured by the virtual presentation time counting unit 
17 reaches the presentation start time of an audio 
frame. The audio packing unit 15 repeatedly plots 

20 stepped parts and, when a free region equivalent to the 
payload of a pack appears in the audio decoder buffer, 
stores the audio frame data accumulated in the audio 
encoding buffer 16 into the next pack and gives the pack 
an SCR showing the time at that point. By repeating this 

25 procedure, the audio packing unit 15 converts audio 
frame data into packs. 

(2-2-2-2^ Buffer Control so that VOBU and Audio Frame 
Data Boundaries Match 

30 

[0159] In addition to performing the simulation of the 
buffer state as described above, the audio packing unit 
15 of the present embodiment has a characteristic fea- 
ture in that it performs buffer control so that the bounda- 

35 ries of VOBUs match boundaries between sets of audio 
frame data. This buffer control controls the audio 
decoder buffer so that when the last (audio) pack in a 
VOBU has been transferred, the audio frame data accu- 
mulated in the audio decoder buffer will complete an 

ao entire audio frame. When such buffer control is main- 
tained, the boundaries between VOBUs will definitely 
match boundaries between sets of audio frame data. 
[0160] Fig. 24 shows the case where the boundaries 
between sets of audio frame data match the boundaries 

45 between VOBUs. 

[0161] The top part of Fig. 24 shows the transition in 
the buffer state of the video decoder buffer. Below this, 
the video pack sequence that causes the illustrated 
transition in the buffer state is shown. In Fig. 24, the sets 

so of picture data v11. v12, v13. v14. v15. and v16 are 
shown, with video pack p31 storing the final picture data 
v15 as the final pack in a VOBU. The video pack p34 
stores the first picture data v16 in the next VOBU. 
[01 62] A pack sequence where video packs and audio 

55 packs have been multiplexed is shown below this in Fig. 
24. The bottom part of Fig. 24, meanwhile, shows the 
transition in the buffer state of the audio decoder buffer. 
A vertical line drawn at the right side of this graph is 
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marked with V at each boundary between sets of audio 
frame data. 

[0163] The final video pack p31 in the multiplexed 
pack sequence has the audio pack p32 immediately 
before it. The transfer of this audio pack p32 causes the 
increase in the occupancy of the audio decoder buffer 
shown by, the inclined part k1 . As shown by the graph at 
the bottom of Rg. 24, an amount of audio frame data 
equal to exactly four audio frames is stored in the audio 
decoder buffer. This shows that the VOBU boundary 
matches a boundary between audio frames. 
[01 64] On the other hand, when only part of the audio 
frame data is stored in the audio decoder buffer, the 
boundary between VOBUs does not match a boundary 
between sets of audio frame data. When the boundaries 
do not match, the audio packing unit 15 can have only 
the remaining part of a set of audio frame data trans- 
ferred so that the boundary between VOBUs matches a 
boundary between audio frames. 
[0165] Fig. 25 shows how the audio packing unit 15 
has only the remaining part of a set of audio frame data 
transferred so that the boundary between VOBUs 
matches the boundary between audio frames. 
[0166] The top part of Fig. 25 and the video pack 
sequence below it are the same as in Fig. 24. Below 
this, the video pack p31 stores the final picture data v1 5 
as the final pack in a GOP with the audio pack p32 
immediately before it, as in Fig. 24. The transfer of this 
audio pack p32 causes the increase in the occupancy of 
the audio decoder buffer shown by the inclined part k1 , 
as in Fig. 24. However, after this transfer of audio pack 
p32, the graph in Fig. 25 differs in that the audio 
decoder buffer stores audio frame data for four frames 
and one part of the audio frame data for a fifth audio 
frame. 

[0167] As shown by the point k2 on the inclined part 
k1, the boundary between VOBUs does not match a 
boundary between sets of audio frame data. At the bot- 
tom of Fig. 25, the reaching of the presentation start 
time of an audio frame results in a reduction in the buffer 
occupancy, as shown by the stepped part k5. The 
height of this stepped part is equivalent to the data size 
of . one set of audio frame data, so that the audio 
decoder buffer ends up storing an incomplete amount of 
audio frame data. 

[01 68] In this state, the boundary between VOBs does 
not match the boundary between sets of audio frame 
data, so that in Fig. 25, the audio pack p33 is arranged 
immediately after the video pack p31 and immediately 
before the video pack p34. The audio pack p33 stores 
the remaining part of a set of audio frame data, so that 
by inputting this audio pack p33, the inclined part k3 is 
produced in the graph at the bottom of Fig. 25. As a 
result, the buffer occupancy of the audio decoder buffer 
increases to the level shown as k4 that represents an 
amount of audio frame data that is exactly equal to four 
sets of audio frame data. This shows that the boundary 
of VOBUs matches a boundary between sets of audio 



frame data. 

[01 69] Notification of the final video pack in a VOBU is 
unexpectedly sent from the video packing unit 18: As a 
result, the audio packing unit 15 has to suddenly 
5 arrange the remaining part of the audio frame data as 
described above. 

[01 70] It should be especially noted that the size of the 
audio decoder buffer is only 4KB, so that there can be 
many cases where the transfer of an audio pack at the 

w end of a VOBU, such as the transfer of audio pack p31 
in the preceding example, will not be possible. One 
example of this is the case where 4KB of audio frame 
data is stored in the audio data buffer even though the 
final set of audio frame data has only been partly stored. 

is Since the capacity of the audio decoder buffer is 4KB, 
which is 5.333... (4096 bytes/768 bytes) times the data 
size of the audio frame data, it can be seen that this rep- 
resents a non-integer number of sets of audio frame 
data. 

20 [0171] Ftg. 26A shows the state where 4KB of audio 
frame data is stored in the audio decoder buffer, though 
the final set of audio frame data is only partly stored. 
The upper part of Fig. 26A shows that the video pack 
p31, which is the final video pack in a VOBU, has the 

25 audio pack p32 positioned immediately before it, in the 
same way as in Fig. 25. 

[01 72] The vertical broken lines that descend from the 
audio pack p32 indicate the inclined part k1 that shows 
the increase in buffer occupancy caused by audio pack 

30 p32. The horizontal line that extends from the point.k2 at 
the peak of the inclined part k1 does not cross the verti- 
cal guideline at a boundary between sets of audio frame 
data, as in Fig. 25. The difference with Fig. 25 is that the 
buffer occupancy at the point k2 is 4,096 bytes. Since 

35 4,096 bytes of audio frame data are already stored in 
the audio decoder buffer, transfer of the audio pack p33 
to the audio decoder buffer in the same way as in Fig. 
25 will cause an overflow in the audio decoder buffer. 
[01 73] In this case, it is impossible to input the remain- 

40 ing part of the audio frame data in audio pack P33 into 
the audio decoder buffer, so that the boundary between 
VOBUs does not match the boundary between sets of 
audio frame data. 

[0174] Buffer control is performed by the audio pack- 
45' ing unit 15 so as to particularly avoid the situation 
described above where the audio decoder buffer is 
completely filled with audio frame data. In detail, the 
audio packing unit 15 has a buffer state maintained 
where the predetermined data amount BSa* is set as 
so the upper limit for the amount of data in the audio 
decoder buffer. Fig. 26B shows the transition in the 
buffer state when the amount of data in the audio 
decoder buffer is subjected to the upper limit BSa' and 
buffer control is performed so that the amount of accu- 
55 mulated data in the audio decoder buffer does not 
exceed BSa'. 

[0175] The rules for the determination of this upper 
limit BSa' depend on the algorithm used by the encoder 
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and there is no especially favorable method for their 
establishment. In the present embodiment, BSa is set 
as the value found by the equation below, where the 
data size of one audio frame is represented by "Aaudio". 

5 

Br = (4KB% Aaudio) (Equation 2-1) 

BSa' = 4KB • Br (Equation 2-2) 

where /Vo" represents a calculation that finds a 10 
remainder 

[0176] The use of the above equation means that the 
upper limit for the amount of data in the audio decoder 
buffer is an integer multiple of the size of the data in one 
audio frame. This means that the amount of data accu- is 
mutated in the audio decoder buffer will not exceed this 
predetermined amount BSa*. Since the amount of accu- 
mulated data in the audio decoder buffer will not exceed 
the value of BSa' found according to Equation 2-2, there 
will always be enough space in the audio decoder buffer 20 
to input the remaining data in an audio frame. To give 
actual numerical examples, when Dolby-AC3 and a 
bitrate of 192Kbps are used, the value of Aaudio will be 
768 bytes, so that Br will be 256 bytes (= 4,096 bytes - 
(768 bytes *5)). This means that in Fig. 26B, the amount 25 
of accumulated data in the audio decoder buffer is sub- 
jected to an upper limit BSa' of 3,840 bytes. 
[01 77] When storing audio frame data into a pack, the 
audio packing unit 1 5 judges whether a value found by 
addirig~~the~ accumulated data amount in the virtual - 30 
decoder buffer 16 to the payload size is no greater than 
the predetermined size BSa'. If so, the audio packing 
unit 15 generates the next pack and assigns an SCR 
that shows the present time to the header. When the 
total of the accumulated data amount in the virtual 35 
decoder buffer 16 and the payload size is greater than 
the predetermined size BSa', the audio packing unit 15 
waits for the accumulated data amount to be reduced by 
the decoding of the next audio frame. When the accu- 
mulated data amount has been sufficiently reduced for 40 
the total of the accumulated data amount and the pay- 
load size to be within the predetermined size BSa', the 
audio packing unit 15 generates the next pack and 
assigns an SCR showing the time at that point to the 
header. 45 
[01 78] The following is a description of the procedure 
by which the audio packing unit 15 simulates the state of 
the audio decoder buffer and generates audio packs 
based on the principles described above. Fig. 27 is a 
flowchart that shows the procedure by which the audio so 
packing unit 15 generates audio packs while simulating 
the state of the audio decoder buffer. 
[0179] In step S1 , the audio packing unit 15 has the 
virtual presentation time counting unit 17 start to count 
the virtual presentation time t. In step S2, the audio 55 
packing unit 15 extracts audio frame data of a predeter- 
mined size from the start of the arrangement of sets of 
audio frame data stored in the audio encoding buffer 2d. 



The audio packing unit 15 stores this extracted audio 
frame data in a pack. Based on the virtual presentation 
time t. the audio packing unit 15 assigns an SCR and 
PTS to generate an audio pack. The audio packing unit 
15 adds the payload size of the pack to the amount of 
accumulated data in the buffer, and plots an inclined 
part in the virtual decoder buffer 16. 
[0180] In step S3, the audio packing unit 15 judges 
whether the virtual presentation time t counted by the 
virtual presentation time counting unit 17 has reached 
the presentation start time of an audio frame. If not, in 
step S4 the audio packing unit 15 determines whether 
the input-possible time of an audio pack has been 
reached. If not, in step S5, the audio packing unit 15 
judges whether the notification of the storage of the final 
video pack in a VOBU has been given. When the result 
"No" is given in every judgement in steps S3 to S5, the 
audio packing unit 15 proceeds to step S6 where it has 
the virtual presentation time counting unit 17 increment 
the virtual presentation time t. 
[0181] The incrementing in step S6 is repeated until 
the result "Yes" is given in one of judgements in steps 
S3 to S5. This repeated incrementing of the virtual pres- 
entation time t results in the virtual presentation time t 
reaching the presentation start time of a set of audio 
frame data. When this is the case, the result "Yes" is 
given in step S3, and the procedure advances to step 

57. In step S7, the audio packing unit 15 plots a stepped 
part in the virtual decoder buffer 16 to reduce the 
amount of accumulated data in the buffer by the size of^ 
the audio frame data. The processing then advances to 
step S6 where the virtual presentation time t is incre- 
mented again, before entering the loop processing in 
steps S3 to S6. 

[0182] On the other hand, when the repeated incre- 
menting of the virtual presentation time t results in the 
virtual presentation time t reaching the input-possible 
time of an audio pack the processing advances to step 

58, where the audio packing unit 15 judges whether a 
size given by adding the amount of data accumulated in 
the buffer to the payload size is within the predeter- 
mined size BSa'. 

[0183] If this size exceeds the predetermined size 
BSa', there is the danger that input of the audio pack 
into the audio decoder buffer will cause an overflow in 
the audio decoder buffer. As a result, the processing 
advances to step S6 and then back to the loop from S3 
to S6 so that the audio packing unit 15 waits for the 
amount of accumulated data in the audio decoder buffer 
to decrease. 

[0184] If the calculated size is below the predeter- 
mined size BSa', the processing advances to step S9 
where the audio packing unit 15 extracts audio frame 
data of a predetermined size from the start of the 
arrangement of sets of audio frame data stored in the 
audio encoding buffer 2d. The audio packing unit 15 
arranges this extracted audio frame data in a payload of 
an audio pack. Based on the virtual presentation time t, 
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the audio packing unit 15 assigns an SCR and PTS to 
the header to generate an audio pack. At the same time, 
the audio packing unit 15 adds the payload size of the 
pack to the amount of accumulated data in the buffer, 
and plots an inclined part in the virtual decoder buffer 5 
1 6. The processing then proceeds to step S6 where the 
virtual presentation time, t is incremented, before the 
processing enters the loop of steps S3 to S6 again. 
[0185] The incrementing of the virtual presentation 
time t is repeated until the audio packing unit 15 unex- 10 
pectedly receives notification from the video packing 
unit 18 that the final video pack in a VOBU has been 
stored. 

[0186] ' On being informed that the final video pack in 
a VOBU has been stored, the audio packing unit 15 is 
advances to step S10 where it finds the remainder 
"Frame_Remain n that is left when the capacity of the 
buffer is divided by the size of one set of audio frame 
data. Next, in step S1 1 , the audio packing unit 1 5 judges 
whether the size of Frame_Remain is zero. If so, the 20 
processing proceeds to step S6 where the virtual pres- 
entation time t is incremented before the processing 
enters the loop of steps S3 to S6. If not, the processing 
advances to step S12, where the audio packing unit 15 
extracts the remaining audio frame data from the start of 2s 
the arrangement of sets of audio frame data stored in 
the audio encoding buffer 2d. The audio packing unit 15 
arranges this extracted audio frame data in a payload of 
an audio pack. 

[0187] Based on the virtual presentation time t, the 30 
audio packing unit 1 5 assigns an SCR and PTS to the 
header to generate an audio pack. The processing then 
advances to step Si 3 where the audio packing unit 15 
judges whether the difference between the payload size 
and the data size of Frame_Remain is 8 bytes or more. 35 
If so, in step S1 4 the audio packing unit 1 5 stores a pad- 
ding packet in the audio pack. On the other hand, if the 
difference is less than 8 bytes, in step S15 the audio 
packing unit 15 stores stuffing bytes into the packet 
header of the audio pack. After this, the processing pro- 40 
ceeds to step S6 where the virtual presentation time t is 
incremented, before the processing enters the loop of 
steps S3 to S6 once again. 

[0188] Since the audio encoding buffer 2d stores the 
plurality of sets of audio frame data encoded by the 45 
audio encoder 2c in the order in which they have been 
encoded, the audio packing unit 15 may judge whether 
the next audio frame data to be stored has been partly 
stored in the immediately preceding audio pack data 
size of the audio frame data in the audio encoding buffer so 
2d by referring to the data size of the audio frame data 
in the audio encoding buffer 2d. 

(2-2-2-3) Procedure for the Partial Deletion of a VOB 

55 

[0189] The control unit 1 performs partial delete oper- 
ations using a standard function for accessing a data 
format standardized under ISO/IEC 13346. The stand- 



ard features provided by the control unit 1 here refers to 
control of the disc access unit 3 read or write data onto 
or from the DVD-RAM in directory units and file units.. 
[01 90] Representative examples of the standard func- 
tions provided by the control unit 1 are as follows. 

1. Having the disc recording unit 100 record a file 
entry and obtaining the file identification descriptor. 

2. Converting a recorded area on the disc that 
includes one file into an empty area. 

3. Controlling the disc access unit 3 to read the file 
identification descriptor of a specified file from a 
DVD-RAM. 

4. Controlling the disc access unit 3 to record data 
present in the memory onto the disc. 

5. Controlling the disc access unit 3 to read an 
extent that composes a file recorded on the disc. 

6. Controlling the disc access unit 3 to move the 
optical pickup to a desired position in the extents 
that compose a file. 

[01 91 ] The following is an explanation of the process- 
ing of the control unit 1 when performing a partial delete 
based on the procedure shown in Figs. 17, 18A, and 
1 8B. Fig. 28 is a flowchart showing the processing when 
performing a partial delete of a VOB. In step S21 of this 
flowchart, the control unit 1 first renews the VOB infor- 
mation and PGC information as shown in Figs. 17, 18A, 
and 18B, and updates the file entries. 
[01 92] In step S22 t the control unit 1 refers to the rel- 
ative address of the VOBU that is given in the time map 
information, and specifies extents that correspond to 
the VOBUs that compose the deleted area. Here, the 
deleted area may correspond to one extent, or to two or 
more extents. The reason a deleted area composed of 
a plurality of VOBUs may correspond to a plurality of 
extents is that an AV file is divided into a plurality of 
extents completely independently of the structure of the 
VOBUs. 

[0193] After the extents have been specified in this 
way, the processing advances to step S30. Step S30 
marks the start of a loop composed of the steps from 
step S23 to S29 that is performed for each of the speci- 
fied extents. 

[0194] In step S23, the control unit 1 determines 
whether the deleted area is positioned at the start of the 
specified extent. Fig. 29A shows the case where the 
deleted area is positioned at the start of the specified 
extent. When the deleted area is at the start of an extent 
as shown in Fig. 29A, the result "Yes" is given for the 
judgement in step S23, and the processing proceeds to 
step S24. 

[0195] In step S24, the logical block length of the 
deleted area is added to the recording start position of 
the specified extent arid the logical block length of this 
extent is reduced by the logical block length of the 
deleted area. By doing so, the control unit 1 updates the 
recording start position and extent length from those 



18 



35 



EP 0 926 903 A1 



36 



indicated by the broken lines in Fig. 29A to those indi- 
cated by the solid lines. 

[0196] In step S25, the control unit 1 judges whether 
the deleted area is positioned at the end of the specified 
extent Fig. 29B shows the case where the deleted area 
is positioned at the end of the specified extent. When 
the deleted area is at the end of an extent as shown in 
Fig. 29B, the result "Yes" is given for the judgement in 
step S25, and the processing proceeds to step S26. In 
step S26, the logical block length of the present extent 
is reduced by the logical block length of the deleted 
area. By doing so, the control unit 1 updates the extent 
length from the broken line shown in Fig. 29B to the 
solid tine. 

[0197] In step S27, the control unit 1 determines 
whether the deleted area is positioned midway through 
the specified extent. Fig. 29C shows the case where the 
deleted area is positioned midway through the specified 
extent. When the deleted area is midway through an 
extent as shown in Rg. 29C, the result "Yes" is given for 
the judgement in step S27, and the processing pro- 
ceeds to step S28. 

[0198] In step S28, the control unit first registers the 
stream data that exists after the deleted area in a file 
entry as a new extent. The control unit 1 then registers 
an allocation descriptor in the file entry. This allocation 
descriptor has the first address in the AV data that fol- 
lows the deleted area as the recording start position and 
the data length of this remaining AV data as the logical 

block lengthr - - — - 

[0199] Next, in step S29, the recording start position 
of the original extent is left as it is, and the logical block 
length written in the allocation descriptor for this extent 
is reduced by a sum of the logical block length of the 
deleted area and the logical block length written in the 
allocation descriptor in the new f ile entry. 
[0200] When the result "No" is given in Fig. 27, the 
specified extent is to be deleted in its entirety, so that the 
processing proceeds to step S31 where the extent is 
deleted. 

[0201 ] By repeating the above loop process for each 
extent specified in step S23, the control unit 1 com- 
pletes the partial delete operation. 
[0202] In the present embodiment, when the total of 
the payload size of audio packs in a VOBU is a non-inte- 
ger multiple of a set of audio frame data, a padding 
packet or stuffing bytes are inserted into a pack to make 
the boundary between VOBUs match a boundary 
between sets of audio frame data. This means that so 
long as a partial delete is performed in VOBU units, 
there is no risk of the partial delete leaving only a former 
or latter part of a set of audio frame data. As a result, by 
updating management information such as file entries 
in units of VOBUs, a recording apparatus can easily per- 
form a partial delete. 

[0203] Even when notification of the storage of the 
final video pack in a VOBU is suddenly received, the 
process for inserting a padding packet or stuffing bytes 



into packs can instantaneously have the boundary of 
VOBUs aligned with a boundary between sets of audio 
frame data using a technique defined .within the buffer 
control method of the audio packing unit 15. 



[0204] The second embodiment of the present inven- 
tion focuses on the storage of sets of audio frame data . 
10 in packs at a ratio of one set of audio frame data to one 
pack. 

[0205] Fig. 30 is a representation of when one set of 

audio frame data is stored in each pack. 

[0206] The upper part of Fig. 30 shows the VOBUs 

is produced by multiplexing audio packs and video packs. 
The audio pack P61 in these VOBUs is indicated by the 
arrows that extend from the audio frame data Z, show- 
ing that this pack only stores the audio frame data Z 
shown in the lower part of Fig. 30. If only the audio 

20 frame data Z is stored, an unused area is left in the 
audio pack P61. To fill this unused area, a padding 
packet is inserted into the audio pack P61 . 
[0207] In the same way, the audio packs P62, P63, 
P64 shown in the upper part of Fig. 30 are respectively 

25 indicated by the arrows that extend from the sets of 
audio frame data Z+1 , Z+2, Z+3. This shows that these 
packs respectively only store the sets of audio frame 
data Z+1, Z+2, Z+3. Since only one set of audio frame 
data is stored in each audio pack, unused areas are left 

30 in the payloads-of each of audio packs P 62, P63, and 
P64. To fill these unused areas, a padding packet is 
inserted into each audio pack. 
[0208] Fig. 31 shows how the state of the buffer 
changes due to the VOBUs shown in Fig. 30. The bot- 

35 torn part of Rg. 31 shows the same VOBUs as Fig. 30. 
The middle part shows a sequence of audio packs that 
is obtained by separating audio packs from the VOBUs 
shown in the bottom part. The top part of Fig. 31 is a 
graph showing the increases in the buffer occupancy of 

40 the audio decoder buffer due to the transfer of the audio 
frame data from the packs in the middle part to the 
audio decoder buffer. 

[0209] Each inclined part in the graph in Fig. 31 starts 
to rise at the SCR given to a packet header and falls at 

45 the PTS given to the pack header of the pack. This 
shows that the input of each audio pack that stores set 
of audio frame data into the audio decoder buffer is 
completed by the presentation start time, at which point 
the audio frame data in the audio pack is decoded. 

so [0210] With the present embodiment, only one set of 
audio frame data is stored in each audio pack, so that 
the simulation of the buffer state using the virtual 
decoder buffer 16 is no longer necessary. This means 
that the construction of the system encoder 2e can be 

55 simplified. The scale of the audio decoder buffer can 
also be reduced to the size of one set of audio frame 
data, which reduces the manufacturing cost of the 
recording apparatus. 
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[021 1 1 The present invention has been described by 
way of the above embodiments, though these embodi- 
ments are mere examples of systems that are presently 
expected to operate favorably. It should be obvious that 
various modifications can be made without departing 5 
from the technical scope of this invention. Seven repre- 
sentative examples of such modifications are given 
below. 

(a) In the first embodiment the DVD recorder 70 10 
was described as being a device to be used in 
place of a domestic non-portable video tape 
recorder. However, when a DVD- RAM is used as a 
storage medium for a computer, the following con- 
struction is also possible. The disc access unit 3 is 
may be connected to a computer bus via a SCSI 
(Small Computer Systems Interface), an IDE (Inte- 
grated Drive Electronics), or IEEE (Institute of Elec- 
trical and Electronics Engineers) 1394 interface so 
as to operate as a DVD- RAM drive. The compo- 20 
nents in Fig. 22 aside from the disc access unit 3 
may be realized by computer hardware, the compu- 
ter OS (operating system), and application software 
that is run on the OS. 

When doing so, the procedure shown in the flow- 25 
chart in Fig. 27 whereby the audio packing unit 15 
uses the virtual decoder buffer 16 to simulate the 
buffer state can be achieved by a machine lan- 
guage program. Such machine language program 
may be distributed and sold having been recorded 30 
on a recording medium. Examples of such record- 
ing medium are an IC (integrated circuit) card, an 
optical disc, or a floppy disc. The machine language 
program recorded on the recording medium may 
then be installed into a standard computer. By exe- 35 
cuting the installed machine language programs, 
the standard computer can achieve the functions of 
the recording apparatus of the first embodiment. 

(b) In the embodiments, only video streams and 
audio streams were described as being multiplexed 40 
into VOBs. However, a sub-picture stream including 

of text for subtitles that has been subjected to run- 
length compression may also be multiplexed into 
VOBs, with the boundaries between VOBs still 
being aligned with the boundaries between sets of 45 
audio frame data. 

(c) The embodiments describe the case where one 
video frame and one audio frame are used as the 
units. However, there are cases where one picture 

is in fact depicted using 1.5 frames, such as for a so 
video stream where 3:2 pulldown is used with 
images for 24 frame per second being subject to 
compression in the same way as with film materials. 
This invention does not effectively depend on 3:2 
pulldown, so that there is no particular restriction on 55 
the frames used. 

(d) In the second embodiment, one set of audio 
Irame data is stored in one audio pack, although 



two or three sets of audio frame data may be stored 
in one audio pack, provided that this in within the 
capacity of the audio pack. 

(e) In the first and second embodiments, Dolby- 
AC3, MPEG, and Linear-PCM are given the audio 
coding modes, although the technical effects 
described in the embodiments can still be achieved 
even if other coding modes are used. 

(f) In the lirst and second embodiments, each pack 
only includes one packet, although a pack may 
instead include a plurality of packets, as is the case 
in conventional MPEG methods. 

(g) The first and second embodiments describe an 
example where a DVD-RAM is used, although the 
present invention is not limited to the use of this 
recording medium. The same effects may still be 
achieved if any rewritable medium, such as a hard 
disk drive or an MO drive, is used. 

[0212] Although the present invention has been fully, 
described by way of examples with reference to the 
accompanying drawings, it is to be noted that various 
changes and modifications will be apparent to those 
skilled in the art Therefore, unless such changes and 
modifications depart from the scope of the present 
invention, they should be construed as being included 
therein. 

Claims 

1. An optical disc for recording video objects that are 
obtained by multiplexing a video stream including a 
plurality of sets of picture data and an audio stream 
including a plurality of sets of audio frame data, 

each video object comprising a plurality of 

video object units whose lengths are within a 

predetermined range, and 

each video object unit storing complete sets of 

picture data and complete sets of audio frame 

data. 

2. The optical disc of Claim 1 , 

wherein the predetermined range is set so that 
a total presentation period of all of the sets of 
picture data in a video object unit is no longer 
than one second. 

3. The optical disc of Claim 2. 

wherein picture groups.are formed in the video 
stream, each picture group including at least 
one set of picture data that has been intra- 
encoded, and ' 

each video object unit includes at least one 
complete picture group. 
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The optical disc of Claim 3, 

wherein the sets of audio frame data included 
in each video object unit are divided into packs 
of a predetermined length, and s 
when a length of the audio frame data in a 
video object unit is shorter than a total length of 
recordable areas in all packs in the video object 
unit that store the audio frame data, one of a 
padding packet and stuffing byte(s) is inserted w 
into at least one pack in the video object unit. 

The optical disc of Claim 4, 

wherein when a difference between the length is 
of the audio frame data in a video object unit 
and the total length of recordable areas in all 
packs in the video object unit that store the 
audio frame data is below a predetermined 
number of bytes, stuffing byte(s) are inserted 20 
into one pack in the video object unit. 

The optical disc of Claim 4, 

wherein when the difference between the 25 
length of the audio frame data in a video object 
unit and the total length of recording areas in all 
packs in the video object unit that store the 
audio frame data is at least the predetermined 
number of bytes, a padding packet is inserted 30- 
into one pack in the video object unit 

The optical disc of Claim 3, wherein the sets of 
audio frame data included in each video object unit 
are divided into packs of a predetermined length, ss 
and each pack in a video object unit includes at 
least one complete set of audio frame data and one 
of a padding packet and stuffing byte(s). 

An optical disc that records file management irtfor- 40 
mation and files that store video objects, 

each video object being obtained by multiplex- 
ing a video stream and an audio stream, 
each video object being an arrangement of a 45 
plurality of video object units, 
each video object unit storing complete sets of 
picture data and complete sets of audio frame 
data, 

each file being divided into one or more extents so 
that are each recorded in a plurality of consec- 
utive areas on the optical disc, 
the file management information including sets 
of position information for each extent, each set 
of position information being given in units of 55 
video object units and being managed as one 
of two states, the two states being "used" and 
"unused". 



9. A recording apparatus that records a video object 
obtained by multiplexing a video stream that 
includes a plurality of sets of picture data and an 
audio stream that includes a plurality of sets of 
audio frame data onto an optical disc, the recording 
apparatus comprising: 

an encoder for encoding input signals received 
from outside to successively generate sets of 
picture data and sets of audio frame data; 
multiplexing means for successively generating 
video object units that compose a video object 
by successively multiplexing the generated 
sets of picture data and audio frame data, each 
video object unit having a length within a pre- 
determined range and including a plurality of 
complete sets of picture data and a plurality of 
complete sets of audio frame data; and 
recording means for recording the plurality of 
video object units generated by the multiplex- 
ing means onto the optical disc as a video 
object. 

1 0. The recording apparatus of Claim 9, 

wherein the sets of audio frame data included 
in each video object unit are divided into packs 
of a predetermined length, and 
the multiplexing means inserts one of a pad- 
- ding packet and stuffing Ibyte(s) into atJeast 
one video object unit to ensure that the audio 
frame data in each video object unit is com- 
posed of complete sets of audio frame data. 

1 1 . The recording apparatus of Claim 1 0, 
wherein the multiplexing means includes: 

a first judging unit for judging, when a video 
object unit is being generated, whether a length 
of the audio frame data in the video object unit 
is shorter than a total length of recordable 
areas in all packs in the video object unit that 
store the audio frame data; and 
an arranging unit for arranging one of a pad- 
ding packet and stuffing byte(s) into the video 
object unit when a judgement of the first judg- 
ing unit is affirmative. 

12. The recording apparatus of Claim 11, 
wherein the multiplexing means further includes: 

a second judging unit for judging, when the 
judgement of the first judging unit is affirmative, 
whether the length of the audio frame data in a 
video object unit is shorter than the total length 
of recordable areas in all packs in the video 
object unit that store the audio frame data by at 
least a predetermined number of bytes, 
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the arranging unit arranges a padding packet signal, whether a first part of a next set of audio 

into the video object unit if a judgement of the frame data has been stored in an immediately 

second judging unit is affirmative and stuffing preceding audio pack 

byte(s) into the video object unit if a judgement the first arranging unit arranging, when the first 

of the second judging unit is negative. 5 part of the next set of audio frame data has 

been stored in the preceding audio pack, a 

13. The recording apparatus of Claim 12, remaining part of the next set of audio frame 

wherein the multiplexing means further includes: data and one of a padding packet and stuffing 

data into a next audio pack, and 

a video packing unit for storing each set of pic- 10 the pack arranging unit arranging the audio 

ture data in at least one video pack and for add- pack that includes one of a padding packet and 

ing a time stamp, showing an input time at stuffing data and the remaining part of the next 

which the video pack should be inputted into a set of audio frame data immediately after a 

decoder buffer, to a pack header of each video video pack that stores the final data division 

pack; 15 obtained by dividing the picture group(s). 
an audio packing unit for storing audio frame 

data, out of the plurality of sets of encoded 16. The recording apparatus of Claim 15, 
audio frame data, of a predetermined size into 

an audio pack and. for adding a time stamp, wherein the second judging unit judges, when 

showing an input time at which the audio pack 20 the first part of the next set of audio frame data 

should be inputted into the decoder buffer, to a has been stored in the immediately preceding 

pack header of the audio pack; and audio pack, whether a difference between a 

pack arranging means for arranging video data size of a remaining part of the next set of 

packs that store picture data and audio packs audio frame data and a size of an audio pack is 

that store audio frame data in order of the 25 at least a predetermined number of bytes, 

respective input times for the decoder buffer, the first arranging unit storing a padding packet 

the video packing unit sending a completion including padding data into the next audio pack 

signal, to the audio packing unit on storing a when the difference is at least equal to the pre- 

final set of picture data in a video object unit determined number of bytes, and storing stuff - 

into one or more video packs, and 30 ing byte(s) as padding data into a packet 

on receiving the completion signal, the audio header of the next audio pack when the differ- 

packing unit storing sets of encoded audio ence is below the predetermined number of 

frame data of the predetermined size into an bytes, 
audio pack, adding one of a padding packet 

and stuffing byte(s) to the audio pack if neces- 35 17. The recording apparatus of Claim 14, further com- 

sary. prising: 



14. The recording apparatus of Claim 13, further com- 
prising: 

40 

an encoder buffer for storing, in an encoding 
order, a plurality of sets of audio frame data 
that have been encoded by the encoder, 
wherein the audio packing unit extracts audio 
frame data of a predetermined size at a front of 45 
the plurality of sets of audio frame data in the 
encoder buffer and arranges the extracted 
audio frame data into an audio pack to fill the 
audio pack. 

50 

15. The recording apparatus of Claim 14, 

wherein the video packing unit sends a com- 
pletion signal to the audio packing unit on stor- 
ing a final data division obtained by dividing 55 
one or more picture groups, 
the first judging unit determining, when the 
audio packing unit has received the completion 



a buffer simulating unit for simulating changes 
over time in buffer occupancy of an audio 
decoder buffer that are caused by (a) input into 
the audio decoder of an audio pack that has 
been newly generated by the audio packing 
unit and (b) decoding of audio frame data that 
is already present in the audio decoder buffer; 
and 

an occupancy calculating unit for calculating a 
time at which the buffer occupancy of the audio 
decoder buffer falls to a value no greater than a 
predetermined upper limit, 
wherein when the occupancy calculating unit 
calculates the time at which the buffer occu- 
pancy of the audio decoder buffer falls to a 
value no greater than a predetermined upper 
limit, the audio packing unit extracts audio 
frame data of the predetermined size from the 
encoded plurality of sets of audio frame data, 
stores the extracted audio frame data into a 
next audio pack, and adds a time stamp show- 
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ing the time calculated by the occupancy calcu- 
lating unit to a pack header of the next audio 
pack. 

18. The recording apparatus of Claim 17, 

wherein the predetermined upper limit is found 
by subtracting (a) the predetermined size of an 
audio pack from (b) an integer multiple of a 
data size of a set of audio frame data, the inte- 10 
ger multiple being within a storage capacity of 
the audio decoder buffer. 

19. A recording apparatus for an optical disc that 
records file management information and files that is 
store video objects, 

each video object being obtained by multiplex- 
ing a video stream and an audio stream, 
each video object being an arrangement of a 20 
plurality of video object units, 
each video object unit storing complete sets of 
picture data and complete sets of audio frame 
data, 

each file being divided into one or more extents 25 
that are each recorded in a plurality of consec- 
utive areas on the optical disc, and 
the file management information including sets 
of position information for each extent, 
- — the recording apparatus comprising: - - 30_. 
partial deletion area receiving means for 
accepting an indication of a partial deletion 
area from a user, the partial deletion area being 
composed of at least one video object unit in a 
video object; 35 
detection means for detecting extents that cor- 
respond to the indicated partial deletion area; 
and 

partial deletion means for performing a partial 
deletion by updating the sets of position infor- 40 
mation for the detected extents. 



video object unit having a length within a pre- 
determined range and including a plurality of 
complete sets of picture data and a plurality of 
complete sets of audio frame data; and 
a recording step for recording the plurality of 
video object units generated by the multiplex- 
ing step onto the optical disc as a video object. 

21. The computer-readable storage medium of Claim 
20, 

wherein the sets of audio frame data included 
in each video object unit are divided into packs 
of a predetermined length, and 
the multiplexing step inserts one of a padding 
packet and stuffing byte(s) into at least one 
video object unit to ensure that the audio frame 
data in each video object unit is composed of 
complete sets of audio frame data. 

22. The computer-readable storage medium of Claim 
21 , wherein the multiplexing step includes: 



a first judging substep for judging, when a 
video object unit is being generated, whether a 
length of the audio frame data in the video 
object unit is shorter than a total length of 
recordable areas in all packs in the video object 
_ unit that store the audio frame data, and 

an arranging substep for arranging one of a 
padding packet and stuffing byte(s) into the 
video object unit when a judgement of the first 
judging substep is affirmative. 

23. The computer-readable storage medium of Claim 
22, wherein the multiplexing step further includes: 



20. A computer-readable storage medium for storing a 
recording program that records video objects that 
are obtained by multiplexing a video stream includ- 
ing a plurality of sets of picture data and an audio 
stream including a plurality of sets of audio frame 
data onto an optical disc, 
the recording program. including: 

an encoding step for encoding input signals 
received from outside to successively generate 
sets of picture data and sets of audio frame 
data; 

a multiplexing step for successively generating 
video object units that compose a video object 
by successively multiplexing the generated 
sets of picture data and audio frame data, each 



a second judging substep for judging, when the 
judgement of the first judging substep is affirm- 
ative, whether the length of the audio frame 
data in a video object unit is shorter than the 
total length of recordable areas in all packs in 
45 the video object unit that store the audio frame 

data by at least a predetermined number of 
bytes, 

the arranging substep arranging a padding 
packet into the video object unit when a judge- 
50 ment of the second judging unit is affirmative 

and arranging stuffing byte(s) into the video 
object unit when a judgement of the second 
judging substep is negative. 

55 24. A computer-readable storage medium for storing a 
recording program for an optical disc, 

the optical disc recording file management 
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information and files that store video objects, 
each video object being obtained by multiplex- 
ing a video stream and an audio stream, 
each video object being an arrangement of a 
plurality of video object units, s 
each video object unit storing complete sets of 
picture data and complete sets of audio frame 
data, 

each file being divided into one or more extents 

that are each recorded in a plurality of consec- to 

utive areas on the optical disc, and 

the file management information including sets 

of position information for each extent, 

the recording program comprising: 

a partial deletion area receiving step for is 

accepting an indication of a partial deletion 

area from a user, the partial deletion area being 

composed of at least one video object unit in a 

video object; 

a detection step for detecting extents that cor- 20 
respond to the indicated partial deletion area; 
and 

a partial deletion step for performing a partial 
deletion by updating the sets of position infor- 
mation for the detected extents. 25 

25. A recording method for recording video objects that 
are obtained by multiplexing a video stream includ- 
ing a plurality of sets of picture data and an audio 
stream including a plurality of sets of audio frame ' 30 
data onto an optical disc, 

the recording method including: 

an encoding step for encoding input signals 
received from outside to successively generate 35 
sets of picture data and sets of audio frame 
data; 

a multiplexing step for successively generating 
video object units that compose a video object 
by successively multiplexing the generated 40 
sets of picture data and audio frame data, each 
video object unit having a length within a pre- 
determined range and including a plurality of 
complete sets of picture data and a plurality of 
complete sets of audio frame data; and 45 
a recording step for recording the plurality of 
video object units generated by the multiplex- 
ing step onto the optical disc as a video object. 

26. The recording method of Claim 25, so 

wherein the sets of audio frame data included 
in each video object unit are divided into packs 
of a predetermined length, and 
the multiplexing step inserts one of a padding 55 
packet and stuffing byte(s) into at least one 
video object unit to ensure that the audio frame 
data in each video object unit is composed of 



complete sets of audio frame data. 

27. The recording method of Claim 26, 
wherein the multiplexing step includes: 

a first judging substep for judging, when a 
video object unit is being generated, whether a 
length of the audio frame data in the video 
object unit is shorter than a total length of 
recordable areas in all packs in the video object 
unit that store the audio frame data, and 
an arranging substep for arranging one of a 
padding packet and stuffing byte(s) into the 
video object unit when a judgement of the first 
judging substep is affirmative. 

28. The computer-readable storage medium of Claim 
27, 

wherein the multiplexing step further includes: 

a second judging substep for judging, when the 
judgement of the first judging substep is affirm- 
ative, whether the length of the audio frame 
data in a video object unit is shorter than the 
total length of recordable areas in all packs in 
the video object unit that store the audio frame 
data by at least a predetermined number of 
bytes, 

the arranging substep arranging a padding 
packet into the video object unit when a judge- 
ment of the second judging unit is affirmative 
and arranging stuffing byte(s) into the video 
object unit when a judgement of the second 
judging substep is negative. 

29. A recording method for an optical disc, 

the optical disc recording file management 
information and files that store video objects, 
each video object being obtained by multiplex- 
ing a video stream and an audio stream, 
each video object being an arrangement of a 
plurality of video object units, 
each video object unit storing complete sets of 
picture data and complete sets of audio frame 
data, 

each file being divided into one or more extents 
that are each recorded in a plurality of consec- 
utive areas on the optical disc, and 
the file management information including sets 
of position information for each extent, 
the recording method comprising: 
a partial deletion area receiving step for 
accepting an indication of a partial deletion 
area from a user, the partial deletion area being 
composed of at least one video object unit in a 
video object; 

a detection step for detecting extents that cor- 
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respond to the indicated partial deletion area; 
and 

a partial deletion step for performing a partial 
deletion by updating the sets of position infor- 
mation for the detected extents. s 
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FIG. 24 
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FIG. 25 
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